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Preface to the Second Edition 


1. The first edition of this book was published in 1977. The text has been well 
received and is still used, although it has been out of print for some time. 


In the intervening three decades, a lot of interesting things have happened 
to mathematical logic: 


(i) Model theory has shown that insights acquired in the study of formal 
languages could be used fruitfully in solving old problems of conventional 
mathematics. 


(ii) Mathematics has been and is moving with growing acceleration from 
the set-theoretic language of structures to the language and intuition of 
(higher) categories, leaving behind old concerns about infinities: a new 
view of foundations is now emerging. 


(iii) Computer science, a no-nonsense child of the abstract computability 
theory, has been creatively dealing with old challenges and providing new 
ones, such as the P/NP problem. 


Planning additional chapters for this second edition, I have decided to focus 
on model theory, the conspicuous absence of which in the first edition was noted 
in several reviews, and the theory of computation, including its categorical and 
quantum aspects. 


The whole Part IV: Model Theory, is new. I am very grateful to 
Boris I. Zilber, who kindly agreed to write it. It may be read directly after 
Chapter IT. 


The contents of the first edition are basically reproduced here as 
Chapters I-VIII. Section IV.7, on the cardinality of the continuum, is 
completed by Section IV.7.3, discussing H. Woodin’s discovery. 


The new Chapter IX: Constructive Universe and Computation, was written 
especially for this edition, and I tried to demonstrate in it some basics of cate- 
gorical thinking in the context of mathematical logic. More detailed comments 
follow. 


I am grateful to Ronald Brown and Noson Yanofsky, who read prelimi- 
nary versions of new material and contributed much appreciated criticism and 
suggestions. 

2. Model theory grew from the same roots as other branches of logic: proof 
theory, set theory, and recursion theory. From the start, it focused on language 
and formalism. But the attention to the foundations of mathematics in model 
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theory crystallized in an attempt to understand, classify, and study models of 
theories of real-life mathematics. 


One of the first achievements of model theory was a sequence of local 
theorems of algebra proved by A. Maltsev in the late 1930s. They were based on 
the compactness theorem established by him for this purpose. The compactness 
theorem in many of its disguises remained a key model-theoretic instrument 
until the end of the 1950s. We follow these developments in the first two sec- 
tions of Chapter X, which culminate with a general discussion of nonstandard 
analysis discovered by A. Robinson. The third section introduces basic tools 
and concepts of the model theory of the 1960s: types, saturated models, and 
modern techniques based on these. 


We try to illustrate every new model-theoretic result with an application in 
“real” mathematics. In Section 4 we discuss an algebro-geometric theorem first 
proved by J. Ax model-theoretically and re-proved by G. Shimura and A. Borel. 
Moreover, we explain an application of the Tarski-Seidenberg quantifier elim- 
ination for R due to L. Hérmander. A real gem of model-theoretic techniques 
of the 1980s is the calculation by J. Denef of the Poincaré series counting 
p-adic points on a variety based on A. Macintyre’s quantifier elimination 
theorem for Qp. 


In the last two sections we present a survey of classification theory, which 
started with M. Morley’s analysis of theories categorical in uncountable powers 
in 1964, and was later expanded by S. Shelah and others to a scale that no one 
could have envisaged. 


The striking feature of these developments is the depth of the very abstract 
“pure” model theory underlying the classification, in combination with the 
diversity of mathematical theories affected by it, from algebraic and 
Diophantine geometry to real analysis and transcendental number theory. 


3. The formal languages with which we work in the first, and in most of 
the second, edition of this book are exclusively linear in the following sense. 
Having chosen an alphabet consisting of letters, we proceed to define classes 
of well-formed expressions in this alphabet that are some finite sequences of 
letters. At the next level, there appear well-formed sequences of words, such as 
deductions and descriptions. Church’s A-calculus furnishes a good example of 
strictures imposed by linearity. 


Nonlinear languages have existed for centuries. Geometers and 
composers could not perform without using the languages of drawings, resp. 
musical scores; when alchemy became chemistry, it also evolved its own 
two-dimensional language. For a logician, the basic problem about nonlinear 
languages is the difficulty of their formalization. 


This problem is addressed nowadays by relegating nonlinear languages of 
contemporary mathematics to the realm of more conventional mathematical 
objects, and then formally describing such languages as one would describe any 
other structure, that is, linearly. 
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Such a strategy probably cannot be avoided. But one must be keenly aware 
that some basic mathematical structures are “linguistic” at their core. Recog- 
nition or otherwise of this fact influences the problems that are chosen, the 
questions that are asked, and the answers that are appreciated. 


It would be difficult to dispute nowadays that category theory as a language 
is replacing set theory in its traditional role as the language of mathematics. 
Basic expressions of this language, commutative diagrams, are one-dimensional, 
but nonlinear: they are certain decorated graphs, whose topology is that of 
1-dimensional triangulated spaces. 


When one iterates the philosophy of category theory, replacing sets of 
morphisms by objects of a category of the next level, commutative diagrams 
become two-dimensional simplicial sets (or cell complexes), and so on. Arguably, 
in this way the whole of homotopy topology now develops into the language of 
contemporary mathematics, transcending its former role as an important and 
active, but reasonably narrow research domain. Much remains to be recognized 
and said about this emerging trend in foundations of mathematics. 


The first part of Chapter IX in this edition is a very brief and tentative 
introduction to this way of thinking, oriented primarily to some reshuffling of 
classical computability theory, as was explained in the Part II of the first edition. 


4. The second part of the new Chapter IX is dedicated to some theoretical 
problems of classical and quantum computing. It introduces the P/NP problem, 
classical and quantum Boolean circuits, and presents several celebrated results 
of this early stage of theoretical quantum computing, such as Shor’s factoring 
and Grover’s search algorithms. 


The main reason to include these topics is my conviction that at least some 
theoretical achievements of modern computer science must constitute an organic 
part of contemporary mathematical logic. 


Already in the first edition, the manuscript for which was completed in 
September 1974, “quantum logic” was discussed at some length; cf. 
Section IT.12. 


A Russian version of the Part II of first edition was published as a sepa- 
rate book, Computable and Uncomputable, by “Soviet Radio” in 1980. For this 
Russian publication, I had written a new introduction, in which, in particular, 
I suggested that quantum computers could be potentially much more powerful 
than classical ones, if one could use the exponential growth of a quantum phase 
space as a function of the number of degrees of freedom of the classical system. 

When a mathematical implementation of this idea, massive quantum 
parallelism, made possible by quantum entanglement, gradually matured, I 
gave a talk at a Bourbaki seminar in June 1999, explaining the basic ideas and 
results. 

Chapter IX is a revised and expanded version of this talk. 

5. Finally, a few words about the last digression in Chapter II, “Truth as 
Value and Duty: Lessons of Mathematics.” 
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“Mathematical truth” was the central concept of the first part of the book, 
“Provability.” Writing this part, I felt that if I did not compensate somehow the 
aridity and sheer technicality of the analysis of formal languages, I would not be 
able to convince people—-the readers that I imagined, working mathematicians 
like me—that it is worth studying at all. The literary device I used to struggle 
with this feeling of helplessness was this: from time to time I allowed myself free 
associations, and wrote the outcome in a series of six digressions, with which 
the first two Chapters were interspersed. 


By the end of the second chapter, I realized that I was finally on the fertile 
soil of “real mathematics,” and the need for digressions faded away. 


Nevertheless, the whole of Part I was left without proper summary. 


Its role is now played by the “Last Digression,” published here for the first 
time. It is a slightly revised text of the talk prepared for a Balzan Foundation 
International Symposium on “Truth in the Humanities, Science and Religion” 
(Lugano, 2008), where I was the only mathematician speaker among philoso- 
phers, historians, lawyers, theologians, and physicists. I was confronted with the 
task to explain to a distinguished “general audience” what is so different about 
mathematical truth, and what light the usage of this word in mathematics can 
throw on its meaning in totally foreign environments. 


The main challenge was this: avoid sounding ponderous. 


Yu. Manin, Bonn December 31, 2008 
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1. This book is above all addressed to mathematicians. It is intended to be a 
textbook of mathematical logic on a sophisticated level, presenting the reader 
with several of the most significant discoveries of the last ten or fifteen years. 
These include the independence of the continuum hypothesis, the Diophantine 
nature of enumerable sets, and the impossibility of finding an algorithmic solu- 
tion for one or two old problems. 


All the necessary preliminary material, including predicate logic and the 
fundamentals of recursive function theory, is presented systematically and with 
complete proofs. We assume only that the reader is familiar with “naive” set- 
theoretic arguments. 


In this book mathematical logic is presented both as a part of mathematics 
and as the result of its self-perception. Thus, the substance of the book consists 
of difficult proofs of subtle theorems, and the spirit of the book consists of 
attempts to explain what these theorems say about the mathematical way of 
thought. 


Foundational problems are for the most part passed over in silence. Most 
likely, logic is capable of justifying mathematics to no greater extent than 
biology is capable of justifying life. 

2. The first two chapters are devoted to predicate logic. The presentation 
here is fairly standard, except that semantics occupies a very dominant position, 
truth is introduced before deducibility, and models of speech in formal languages 
precede the systematic study of syntax. 


The material in the last four sections of Chapter II is not completely 
traditional. In the first place, we use Smullyan’s method to prove Tarski’s the- 
orem on the undefinability of truth in arithmetic, long before the introduction 
of recursive functions. Later, in the seventh chapter, one of the proofs of the 
incompleteness theorem is based on Tarski’s theorem. In the second place, a 
large section is devoted to the logic of quantum mechanics and to a proof of 
von Neumann’s theorem on the absence of “hidden variables” in the quantum- 
mechanical picture of the world. 


The first two chapters together may be considered as a short course in logic 
apart from the rest of the book. Since the predicate logic has received the widest 
dissemination outside the realm of professional mathematics, the author has not 
resisted the temptation to pursue certain aspects of its relation to linguistics, 
psychology, and common sense. This is all discussed in a series of digressions, 
which, unfortunately, too often end up trying to explain “the exact meaning 
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of a proverb” (E. Baratynsky).! This series of digressions ends with the second 
chapter. 


The third and fourth chapters are optional. They are devoted to complete 
proofs of the theorems of Godel and Cohen on the independence of the contin- 
uum hypothesis. Cohen forcing is presented in terms of Boolean-valued models; 
Godel’s constructible sets are introduced as a subclass of von Neumann’s 
universe. The number of omitted formal deductions does not exceed the 
accepted norm; due respects are paid to syntactic difficulties. This ends the 
first part of the book: “Provability.” 


The reader may skip the third and fourth chapters, and proceed immedi- 
ately to the fifth. Here we present elements of the theory of recursive functions 
and enumerable sets, formulate Church’s thesis, and discuss the notion of algo- 
rithmic undecidability. 

The basic content of the sixth chapter is a recent result on the Diophantine 
nature of enumerable sets. We then use this result to prove the existence 
of versal families, the existence of undecidable enumerable sets, and, in the 
seventh chapter, Gédel’s incompleteness theorem (as based on the definability of 
provability via an arithmetic formula). Although it is possible to 
disagree with this method of development, it has several advantages over earlier 
treatments. In this version the main technical effort is concentrated on proving 
the basic fact that all enumerable sets are Diophantine, and not on the more 
specialized and weaker results concerning the set of recursive descriptions or 
the Godel numbers of proofs. 


The last section of the sixth chapter stands somewhat apart from the rest. 
It contains an introduction to the Kolmogorov theory of complexity, which is 
of considerable general mathematical interest. 


The fifth and sixth chapters are independent of the earlier chapters, and 
together make up a short course in recursive function theory. They form the 
second part of the book: “Computability.” 


The third part of the book, “Provability and Computability,” relies heavily 
on the first and second parts. It also consists of two chapters. All of the seventh 
chapter is devoted to Gédel’s incompleteness theorem. The theorem appears 
later in the text than is customary because of the belief that this central result 
can only be understood in its true light after a solid grounding both in formal 
mathematics and in the theory of computability. Hurried expositions, where 


' Nineteenth century Russian poet (translator’s note). The full poem is: 


We diligently observe the world, 

We diligently observe people, 

And we hope to understand their deepest meaning. 
But what is the fruit of long years of study? 

What do the sharp eyes finally detect? 

What does the haughty mind finally learn 

At the height of all experience and thought, 
What?—the exact meaning of an old proverb. 
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the proof that provability is definable is entirely omitted and the mathematical 
content of the theorem is reduced to some version of the “liar paradox,” can 
only create a distorted impression of this remarkable discovery. The proof is 
considered from several points of view. We pay special attention to properties 
which do not depend on the choice of Gédel numbering. Separate sections are 
devoted to Feferman’s recent theorem on Gédel formulas as axioms, and to the 
old but very beautiful result of Godel on the length of proofs. 


The eighth and final chapter is, in a way, removed from the theme of the 
book. In it we prove Higman’s theorem on groups defined by enumerable sets 
of generators and relations. The study of recursive structures, especially in 
group theory, has attracted continual attention in recent years, and it seems 
worthwhile to give an example of a result which is remarkable for its beauty 
and completeness. 


3. This book was written for very personal reasons. After several years or 
decades of working in mathematics, there almost inevitably arises the need to 
stand back and look at this research from the side. The study of logic is, to a 
certain extent, capable of fulfilling this need. 


Formal mathematics has more than a slight touch of self-caricature. Its 
structure parodies the most characteristic, if not the most important, features of 
our science. The professional topologist or analyst experiences a strange feeling 
when he recognizes the familiar pattern glaring out at him in stark relief. 

This book uses material arrived at through the efforts of many mathemati- 
cians. Several of the results and methods have not appeared in monograph 
form; their sources are given in the text. The author’s point of view has formed 
under the influence the ideas of Hilbert, G6del, Cohen, and especially John von 
Neumann, with his deep interest in the external world, his open-mindedness 
and spontaneity of thought. 

Various parts of the manuscript have been discussed’ with 
Yu. V. Matiyasevié, G. V. Cudnovskii, and S. G. Gindikin. I am deeply grateful 
to all of these colleagues for their criticism. 

W. D. Goldfarb of Harvard University very kindly agreed to proofread the 
entire manuscript. For his detailed corrections and laborious rewriting of part 
of Chapter IV, I owe a special debt of gratitude. 


I wish to thank Neal Koblitz for his meticulous translation. 


Yu. I. Manin Moscow, September 1974 
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Introduction to Formal Languages 


Gelegentlich ergreifen wir die Feder 
Und schreiben Zeichen auf ein weisses Blatt, 
Die sagen dies und das, es kennt sie jeder, 
Es ist ein Spiel, das seine Regeln hat. 

H. Hesse, “Buchstaben” 
We now and then take pen in hand 
And make some marks on empty paper. 
Just what they say, all understand. 
It is a game with rules that matter. 

H. Hesse, “Alphabet” 

(translated by Prof. Richard S. Ellis) 


1 General Information 


1.1. Let A be any abstract set. We call A an alphabet. Finite sequences of 
elements of A are called expressions in A. Finite sequences of expressions are 
called texts. 

We shall speak of a language with alphabet A if certain expressions and texts 
are distinguished (as being “correctly composed,” “meaningful,” etc.). Thus, in 
the Latin alphabet A we may distinguish English word forms and grammatically 
correct English sentences. The resulting set of expressions and texts is a working 
approximation to the intuitive notion of the “English language.” 

The language Algol 60 consists of distinguished expressions and texts in the 
alphabet {Latin letters} U {digits} U {logical signs} U {separators}. Programs 
are among the most important distinguished texts. 

In natural languages the set of distinguished expressions and texts usually 
has unsteady boundaries. The more formal the language, the more rigid these 
boundaries are. 

The rules for forming distinguished expressions and texts make up the syntax 
of the language. The rules that tell how they correspond with reality make 
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up the semantics of the language. Syntax and semantics are described in a 
metalanguage. 


1.2. “Reality” for the languages of mathematics consists of certain classes of 
(mathematical) arguments or certain computational processes using (abstract) 
automata. Corresponding to these designations, the languages are divided into 
formal and algorithmic languages. (Compare: in natural languages, the declar- 
ative versus imperative moods, or—on the level of texts—statement versus 
command. ) 

Different formal languages differ from one another, in the first place, by 
the scope of the formalizable types of arguments—their expressiveness; in the 
second place, by their orientation toward concrete mathematical theories; and 
in the third place, by their choice of elementary modes of expression (from 
which all others are then synthesized) and written forms for them. 

In the first part of this book a certain class of formal languages is examined 
systematically. Algorithmic languages are brought in episodically. 

The “language-parole” dichotomy, which goes back to Humboldt and 
Saussure, is as relevant to formal languages as to natural languages. In §3 of 
this chapter we give models of “speech” in two concrete languages, based on set 
theory and arithmetic, respectively, because, as many believe, habits of speech 
must precede the study of grammar. 

The language of set theory is among the richest in expressive means, despite 
its extreme economy. In principle, a formal text can be written in this language 
corresponding to almost any segment of modern mathematics—topology, func- 
tional analysis, algebra, or logic. 

The language of arithmetic is one of the poorest, but its expressive possi- 
bilities are sufficient for describing all of elementary arithmetic, and also for 
demonstrating the effects of self-reference a la Godel and Tarski. 


1.3. As a means of communication, discovery, and codification, no formal 
language can compete with the mixture of mathematical argot and formulas 
that is common to every working mathematician. 

However, because they are so rigidly normalized, formal texts can 
themselves serve as an object for mathematical investigation. The results of 
this investigation are themselves theorems of mathematics. They arouse great 
interest (and strong emotions) because they can be interpreted as theorems 
about mathematics. But it is precisely the possibility of these and still broader 
interpretations that determines the general philosophical and human value of 
mathematical logic. 


1.4. We have agreed that the expressions and texts of a language are elements 
of certain abstract sets. In order to work with these elements, we must some- 
how fix them materially. In the modern European tradition (as opposed to the 
ancient Babylonian tradition, or the latest American tradition, using computer 
memory), the following notation is customary. The elements of the alphabet are 
indicated by certain symbols on paper (letters of different kinds of type, digits, 
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additional signs, and also combinations of these). An expression in an alphabet 
A is written in the form of a sequence of symbols, read from left to right, with 
hyphens when necessary. A text is written as a sequence of written expressions, 
with spaces or punctuation marks between them. 


1.5. If written down, most of the interesting expressions and texts in a formal 
language either would be physically extremely long, or else would be psycho- 
logically difficult to decipher and learn in an acceptable amount of time, or 
both. 

They are therefore replaced by “abbreviated notation” (which can some- 
times turn out to be physically longer). The expression “axxxxxx” can be briefly 
written “x...x (six times)” or “x®.” The expression “Vz(z € x & z € y)” can 
be briefly written “x = y.” Abbreviated notation can also be a way of denoting 
any expression of a definite type, not only a single such expression (any expres- 
sion 101010...10 can be briefly written “the sequence of length 2n with ones 
in odd places and zeros in even places” or “the binary expansion of $(4”—1)”). 

Ever since our tradition started, with Viete, Descartes, and Leibniz, abbre- 
viated notation has served as an inexhaustible source of inspiration and errors. 
There is no sense in, or possibility of, trying to systematize its devices; they 
bear the indelible imprint of the fashion and spirit of the times, the artistry and 
pedantry of the authors. The symbols ©, {, € are classical models worthy of 
imitation. Frege’s notation, now forgotten, for “P and Q” (actually “not [if P, 
then not Q]” whence the asymmetry): 


P. 


shows what should be avoided. In any case, abbreviated notation permeates 
mathematics. 
The reader should become used to the trinity 


formal text 


an 


written text ——__—————— interpretation of text, 


which replaces the unconscious identification of a statement with its form and 
its sense, as one of the first priorities in his study of logic. 
2 First-Order Languages 


In this section we describe the most important class of formal languages 
£,—the first-order languages—and give two concrete representatives of this 
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class: the Zermelo—Fraenkel language of set theory L,Set, and the Peano 
language of arithmetic LAr. Another name for £; is predicate languages. 


2.1. The alphabet of any language in the class £; is divided into six disjoint 
subsets. The following table lists the generic name for the elements in each 
subset, the standard notation for these elements in the general case, the special 
notation used in this book for the languages L;Set and L; Ar. We then describe 
the rules for forming distinguished expressions and briefly discuss semantics. 

The distinguished expressions of any language LD in the class £; are divided 
into two types: terms and formulas. Both types are defined recursively. 


2.2. Definition. Terms are the elements of the least subset of the expressions 
of the language that satisfies the following two conditions: 


(a) Variables and constants are (atomic) terms. 
(b) If f is an operation of degree r and t),...,¢, are terms, then f(t1,...,t,) 
is a term. 


In (a) we identify an element with a sequence of length one. The alpha- 
bet does not include commas, which are part of our abbreviated notation: 
f(ti, te, ts) means the same as f(titgt3). In §1 of Chapter II we explain how a 
sequence of terms can be uniquely deciphered despite the absence of commas. 

If two sets of expressions in the language satisfy conditions (a) and (b), 
then the intersection of the two sets also satisfies these conditions. Therefore 
the definition of the set of terms is correct. 


Language Alphabets 


Subsets of the 
Alphabet General in LSet in L, Ar 


connectives and (equivalent); (implies); V(inclusive or); A (and); 
quantifiers —(not); V (universal quantifier); 4 (existential quantifier) 
L,Y, 2%, U, v,...with indices 
ensirecti= [Waa aed 


operations of + (addition, degree 2); 
degree fg, ... with none -(multiplication, 
W523i ta indices degree 2) 


relations (predicates) € (is an element = (equality, degree 2) 
of degree Pp, q, .-. with of, degree 2); 
ies eee indices = (equals, degree 2) 


((left parenthesis) ;)(right parenthesis) 


2.3. Definition. Formulas are the elements of the least subset of the expressions 
of the language that satisfies the following two conditions: 


(a) If pis a relation of degree r and t;,...,¢, are terms, then p(t,...,¢,) is an 
(atomic) formula. 
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(b) If P and Q are formulas (abbreviated notation!), and x is a variable, then 
the expressions 


are formulas. 


It is clear from the definitions that any term is obtained from atomic terms 
in a finite number of steps, each of which consists in “applying an operation 
symbol” to the earlier terms. The same is true for formulas. In Chapter IT, §1 
we make this remark more precise. 

The following initial interpretations of terms and formulas are given for 
the purpose of orientation and belong to the so-called “standard models” (see 
Chapter II, §2 for the precise definitions). 


2.4. EXAMPLES AND INTERPRETATIONS 


(a) The terms stand for (are notation for) the objects of the theory. Atomic 
terms stand for indeterminate objects (variables) or concrete objects (con- 
stants). The term f(t1,...,¢,-) is the notation for the object obtained by apply- 
ing the operation denoted by f to the objects denoted by ¢1,...,t,. Here are 
some examples from LAr: 


0 denotes zero; 
I denotes one; 
) 


denotes two (1 + 1 = 2 in the usual notation); 


+(i ~ 1,1) denotes three; 
1 


(+ (1,1) + (1, 1) denotes four (2 x 2 = 4). 


Since this normalized notation is different from what we are used to in arith- 
metic, in L;Ar we shall usually write simply t; + to instead of +(t1, tg) and 
t, - tz instead of -(t1, t2). This convention may be considered as another use of 
abbreviated notation: 


x stands for an indeterminate integer; 
x+1 (or+(a,1)) stands for the next integer. 


In the language L;Set all terms are atomic: 


x stands for an indeterminate set; 


@ stands for the empty set. 
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(b) The formulas stand for statements (arguments, propositions, ...) of the 
theory. When translated into formal language, a statement may be 
either true, false, or indeterminate (if it concerns indeterminate objects); see 
Chapter II for the precise definitions. In the general case the atomic formula 
p(ti,...,t,) has roughly the following meaning: “The ordered 1-tuple of objects 
denoted by t;,...,¢, has the property denoted by p.” Here are some examples 
of atomic formulas in L;Ar. Their general structure is = (t1, tg), or, in nonnor- 
malized notation, t; = ta: 


Some atomic formulas in L; Set 
yeu (y is an element of x), 
and also @ € y, x € @, etc. Of course, normalized notation must have the form 


€ (ay), and so on. 
Some nonatomic formulas: 


J a(Vy(A(y € 2))) : there exists an x of which no y is an element. 


Informally this means: “The empty set exists.” We once again recall that an 
informal interpretation presupposes some standard interpretive system, which 
will be introduced explicitly in Chapter II. 


Vy(yezS>yen): z is a subset of x. 


This is an example of a very useful type of abbreviated notation: four paren- 
theses are omitted in the formula on the left. We shall not specify precisely 
when parentheses may be omitted; in any case, it must be possible to reinsert 
them in a way that is unique or is clear from the context without any special 
effort. 

We again emphasize: the abbreviated notation for formulas are only material 
designations. Abbreviated notation is chosen for the most part with psycholog- 
ical goals in mind: speed of reading (possibly with a loss in formal uniqueness), 
tendency to encourage useful associations and discourage harmful ones, suit- 
ability to the habits of the author and reader, and so on. The mathematical 
objects in the theory of formal languages are the formulas themselves, and not 
any particular designations. 
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Digression: Names 


On several occasions we have said that a certain object (a sign on paper, an 
element of an alphabet as an abstract set, etc.) is a notation for, or denotes, 
another element. A convenient general term for this relationship is naming. 

The letter x is the name of an element of the alphabet; when it appears in 
a formula, it becomes the name of a set or a number; the notation x € y is the 
name of an expression in the alphabet A, and this expression, in turn, is the 
name of an assertion about indeterminate sets; and so on. 

When we form words, we often identify the names of objects with the objects 
themselves: we say “the variable x,” “the formula P,” “the set z.” This can 
sometimes be dangerous. The following passage from Rosser’s book Logic for 
Mathematicians points up certain hidden pitfalls: 


The gist of the matter is that, if we have a statement such as “3 is greater 
than y about the rational number 3 and containing a name ae of 
this rational number, one can replace this name by any other name of 
the same rational number, for instance, ue If we have a statement 
such as “3 divides the denominator of on ” about a name of a rational 
number and containing a name of this name, one can replace this name 
of the name by some other name of the same name, but not in general 
by the name of some other name, if it is a name of some other name of 


the same rational number. 


Rosser adds that “failure to observe such distinctions carefully can seldom 
lead to confusion in logic and still more seldom in mathematics.” However, 
these distinctions play a significant role in philosophy and in mathematical 
practice. 

“A rose by any other name would smell as sweet”—this is true because 
roses exist outside of us and smell in and of themselves. But, for example, it 
seems that Hilbert spaces “exist” only insofar as we talk about them, and the 
choice of terminology here makes a difference. The word “space” for the set 
of equivalence classes of square integrable functions was at the same time a 
codeword for an entire circle of intuitive ideas concerning “real” spaces. This 
word helped organize the concept and led it in the right direction. 

A successfully chosen name is a bridge between scientific knowledge and 
common sense, between new experience and old habits. The conceptual foun- 
dation of any science consists of a complicated network of names of things, 
names of ideas, and names of names. It evolves itself, and its projection on 
reality changes. 


3 Beginners’ Course in Translation 
3.1. We recall that the formulas in L,Set stand for statements about sets; the 


formulas in L; Ar stand for statements about natural numbers; these formulas 
contain names of sets and numbers, which may be indeterminate. 


10 I Introduction to Formal Languages 


In this section we give the first basic examples of two-way translation 
“argot <= formal language.” One of our purposes will be to indicate the great 
expressive possibilities in LySet and LAr, despite the extremely limited modes 
of expression. 

As in the case of natural languages, this translation cannot be given by rigid 
rules, is not uniquely determined, and is a creative process. Compare Hesse’s 
quatrain with its translation in the epigraph to this book: the most important 
aim of translation is to “understand ... just what they say.” 

Before reading further, the reader should look through the appendix to 
Chapter IT; “The von Neumann Universe.” The semantics implicit in LSet 
relates to this universe, and not to arbitrary “Cantor” sets. 

A more complete picture of the meaning of the formulas can be obtained 
from §2 of Chapter II. 


Translation from LySet to argot. 


3.2. V a(7(a € @)): “for all (sets) x it is false that x is an element of (the set) 
@” (or “@ is the empty set”). 

The second assertion is equivalent to the first only in the von Neumann 
universe, where the elements of sets can only be sets, and not real numbers, 
chairs, or atoms. 


3.3.V2(zeuezey) ex=y: “if for all z it is true that z is an element of 
x if and only if z is an element of y, then it is true that x coincides with y; and 
conversely,” or “a set is uniquely determined by its elements.” 

In the expression 3.3 at least six parentheses have been omitted; and the 
subformulas z € xz, z € y, x = y have not been normalized according to the 
rules of £. 


3.4. Vu Vu da Vz(z Cx & (cz =uVz=v)): “for any two sets u, v there exists 
a third set x such that u and v are its only elements.” 

This is one of the axioms of Zermelo—Fraenkel. The set x is called the 
“unordered pair of sets u, v” and is denoted {u, v} in the appendix. 


3.5. Vy Vz(((z EyAy et) > zEa)A(ye x > ly € y))): “the set x is 
partially ordered by the relation € between its elements.” 

We mechanically copied the condition y € « = 7(y € y) from the definition 
of partial ordering. This condition is automatically fulfilled in the von Neumann 
universe, where no set is an element of itself. 

A useful exercise would be to write the following formulas: 


“x is totally ordered by the relation €”; 
“x is linearly ordered by the relation €”; 


“xv is an ordinal.” 
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3.6. Va(y € z): The literal translation “for all « it is true that y is an element 
of z” sounds a little strange. The formula Va da(y € z), which agrees with the 
rules for constructing formulas, looks even worse. It would be possible to make 
the rules somewhat more complicated, in order to rule out such formulas, but 
in general they cause no harm. In Chapter II we shall see that from the point 
of view of “truth” or “deducibility,” such a formula is equivalent to the formula 
y € z. It is in this way that they must be understood. 


Translation from argot to LySet. 


We choose several basic constructions having general mathematical signifi- 
cance and show how they are realized in the von Neumann universe, which 
contains only sets obtained from @ by the process of “collecting into a set,” 
and in which all relations must be constructed from €. 


3.7. “a is the direct product y x z.” 
This means that the elements of x are the ordered pairs of elements of y 
and z, respectively. The definition of an unordered pair is obvious: the formula 


Vu (ue as (u=y1 Vu= 21)) 


“means,” or may be briefly written in the form, « = {y1, 21} (compare 3.4). The 

ordered pair y; and z, is introduced using a device of Kuratowski and Wiener: 

this is the set 2, whose elements are the unordered pairs {y1, yi} and {y1, 21}. 
We thus arrive at the formula 


Aye Aze(“r1 = {y2, 22}” A “yo = {yr, yi }” A “zo = {yr 21}”), 
which will be abbreviated 


oo (yi, z1) 
and will be read “a, is the ordered pair with first element y; and second element 
z,.” The abbreviated notation for the subformulas is in quotes; we shall later 
omit the quotation marks. 
Finally, the statement “xs = y x z” may be written in the form 


Vai(a1 Er Say An(m EyA 2 € ZA “x1 = (y1, 21)”)). 


In order to remind the reader for the last time of the liberties taken in 
abbreviated notation, we write this same formula adhering to all the canons 
of £1: 


Vary [« (x1 2)) 


[Bin (341 (((e Cn) aCe (eed) A (Bu ( See (((Wu((€ (urn) 
€ ((= (uya) V (= (wz2))))) A Ceu((E (uy2)) 


(= (uyn))))) A (Wu((€ (uza) 4 (= (uy) v= w-))))] 
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EXERCISE: Find the open parenthesis corresponding to the fifth closed paren- 
thesis from the end. In $1 of Chapter II we give an algorithm for solving such 
problems. 


3.8. “fis a mapping from the set u to the set v.” 

First of all, mappings, or functions, are identified with their graphs; other- 
wise, we would not be able to consider them as elements of the universe. The 
following formula successively imposes three conditions on f:f is a subset of 
u x v; the projection of f onto u coincides with all of u; and each element of wu 
corresponds to exactly one element of v: 


V2(z € f > Gur Aui(ur CuA vs EC vA “z= (u1,1)”))) 
AWYuy(uy € u= Av, daz(v1 € vA “z = (ui, 01)” Az € f)) 
AWuy Voy Vu2(dz1 dzi(z1 € f Azo © f A “21 = (ur, v1)” A “zo = (U1, v2)”)) 


> U1 = U2). 


EXERCISE: Write the formula “f is the projection of y x z onto z.” 


3.9. “x is a finite set.” 

Finiteness is far from being a primitive concept. Here is Dedekind’s defini- 
tion: “there does not exist a one-to-one mapping f of the set x onto a proper 
subset.” The formula: 


= 


f(<f is a mapping from x to x” A Vuy Vu2 Voy Vv2((“(ur, v1) € f” 


\ “(ug, U2) Ef’a (uy ua)) (U4 v2) \ Jui (v1 ExA7duy 


(“(u1, v1) € f”))). 


The abbreviation “(u;,v1) € f” means, of course, dy(“y = (ui, v1))” Ay € f). 


3.10. “x is a nonnegative integer.” 
The natural numbers are represented in the von Neumann universe by the 
finite ordinals, so that the required formula has the form 


“x is totally ordered by the relation €” A “sx is finite.” 


EXERCISE: Figure out how to write the formulas “a + y = 2” and “x-y = 2” 
where x,y, z are integers > 0. 


After this it is possible in the usual way to write the formulas “zx is an 
integer,” “ax is a rational number,” “ax is a real number” (following Cantor or 
Dedekind), etc., and then construct a formal version of analysis. The written 
statements will have acceptable length only if we periodically extend the lan- 
guage L,Set (see §8 of Chapter II). For example, in LSet we are not allowed 
to write term-names for the numbers 1, 2, 3, ...(@ is the name for 0), although 
we may construct the formulas “a is the finite ordinal containing 1 element,” 
“x is the finite ordinal containing 2 elements,” etc. If we use such roundabout 
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methods of expression, the simplest numerical identities become incredibly long; 
but of course, in logic we are mainly concerned with the theoretical possibility 
of writing them. 


3.11. “x ts a topological space.” 

In the formula we must give the topology of x explicitly. We define the 
topology, for example, in terms of the set y of all open subsets of x. We first 
write that y consists of subsets of x and contains x and the empty set: 


Pr: Ve(zeys>Vu(we zSuexz)yAreyASey. 
The intersection w of any two elements u, v in y is open, i.e., belongs to y: 
Po: VutuVu((ueyAveyAV2((z €uAzev) Sz€w)) Swey). 


It is harder to write “the union of any set of open subsets is open.” We first 
write 
Pz: VuluezeVo(u Eusvey)), 


that is, “z is the set of all subsets of y.” Then 


Py: VuVw((u € zAVoi(v1 € w & dv(u E uA v1 € v))) > wey). 


This means (taking into account P3, which defines z); “If wis any subset of y, 
i.e., a set of open subsets of x, then the union w of all these subsets belongs 
to y, i.e., is open.” Now the final formula may be written as follows: 


P, \ Po AV2(P3 => P). 


The following comments on this formula will be reflected in precise defini- 
tions in Chapter IT, §$1 and 2. The letters x, y have the same meaning in all the 
P;, while z plays different roles: in P, it is a subset of x, and in Ps and P, it is 
the set of subsets of x. We are allowed to do this because as soon as we “bind” 
z by the quantifier V, say in Pj, z no longer stands for an (indeterminate) 
individual set, and becomes a temporary designation for “any set.” Where the 
“scope of action” of V ended, z can be given a new meaning. In order to “free” 
z for later use, Vz was also put before P3 => Py. 


Translation from argot to LyAr. 


3.12. “x < y”: dz(y = (x +z) +1). Recall that the variables are names for 
nonnegative integers. 


3.13. “x is a divisor of y’: dz(y = «- z). 
3.14. “x is a prime number”: “1 < x” A ( “y is a divisor of x” > (y=1V y=2)). 


3.15. “Fermat’s last theorem”: Vr, V2 Vr3 Vu(“2 < u” A “et + af = cf” => 
“vy@9r3 = 0”). It is not clear how to write the formula {+ 2} = x} 
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in L,Ar. Of course, for any concrete u = 1, 2, 3 there is a correspond- 
ing atomic formula in LiAr, but how do we make wu into a variable? This 
is not a trivial problem. In the second part of the book we show how to 
find an atomic formula p(#, u, y, 21,---,; Zn) such that the assertion that 
dz, +++ dnp (a, u, Y, 21,---, 2n) in the domain of natural numbers is equiva- 
lent y = 2”. Then x} + 23 = x3 can be translated as follows: 


6 ,U 66, U 


Fy1 Sy2 Sys (“et = yr” A “ag = yo” A “xz = y3” A yi + yo = ya). 


The existence of such a p is a nontrivial number-theoretic fact, so that here the 
very possibility of performing a translation becomes a mathematical 
problem. 


3.16. “The Riemann hypothesis.” The Riemann zeta function ¢ (s) is defined 
by the series UP2., n~* in the half-plane Re s > 1. It can be continued mero- 
morphically onto the entire complex s-plane. The Riemann hypothesis is the 
assertion that the nontrivial zeros of ¢(s) lie on the line Re s = 4. Of course, 
in this form the Riemann hypothesis cannot be translated into L;Ar. However, 
there are several purely arithmetic assertions that are demonstrably equivalent 
to the Riemann hypothesis. Perhaps the simplest of them is the following. 

Let y(n) be the Mobius function on the set of integers > 1: it equals 0 if 
n is divisible by a square, and equals (—1)", where r is the number of prime 
divisors of n, if n is square-free. We then have 


y 
Riemann hypothesis <= Ve > 0 dx Vy c > LS | » y(n) < ye] . 
n=1 


Only the exponent is not an integer on the right; but ¢ need only run through 
numbers of the form 1/z, z an integer > 1, and then we can raise the inequality 
to the (2z)th power. The formula 


(> H)) a a 


can then be translated into L,Ar, although not completely trivially. The neces- 
sary techniques will be developed in the second part of the book. 

The last two examples were given in order to show the complexity that is 
possible in problems that can be stated in L,Ar, despite the apparent simplicity 
of the modes of expression and the semantics of the language. 

We conclude this section with some remarks concerning higher-order 
languages. 


3.17. Higher-order languages. Let L be any first-order language. Its modes 
of expression are limited in principle by one important consideration: we are 
not allowed to speak of arbitrary properties of objects of the theory, that is, 
arbitrary subsets of the set of all objects. Syntactically, this is reflected in the 
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prohibition against forming expressions such as Vp(p(x)), where p is a relation 
of degree 1; relations must stand for fixed rather than variable properties. 

Of course, certain properties can be defined using nonatomic formulas. For 
example, in L,Ar instead of “x is even” we may write dy(a = (1+ 1)-y). 
However, there is a continuum of subsets of the integers but only a countable 
set of definable properties (see §2 of Chapter II), so there are automati- 
cally properties that cannot be defined by formulas. Thus, it is impossible 
to replace the forbidden expression Vp(p(x)) by a sequence of expressions 
P,(x), Po(x), P3(x), tee 

Languages in which quantifiers may be applied to properties and/or func- 
tions (and also, possibly, to properties of properties, and so on) are called higher- 
order languages. One such language—L2Real—will be considered in Chapter III 
for the purpose of illustrating a simplified version of Cohen forcing. 

On the other hand, the same extension of expressive possibilities can be 
obtained without leaving £1. In fact, in the first-order language LSet we may 
quantify over all subsets of any set, over all subsets of the set of subsets, and 
so on. Informally this means that we are speaking of all properties, all proper- 
ties of properties, ... (with transfinite extension). In addition, any higher-order 
language with a “standard interpretation” in some type of structured sets can 
be translated into LiSet so as to preserve the meanings and truth values in 
this standard interpretation. (An apparent exception is the languages for 
describing Godel-Bernays classes and “large” categories; but it seems, based 
on our present understanding of paradoxes, that no higher-order languages can 
be constructed from such a language.) 

The attentive reader will notice the contrast between the possibility of writ- 
ing a formula in L,Set in which V is applied to all subsets (informally, to all 
properties) of finite ordinals (informally, of integers) and the impossibility of 
writing a formula in LSet that would define any concrete subset in the con- 
tinuum of undefinable subsets. (There are fewer such subsets in L1Set than in 
L,Ar, but still a continuum.) We shall examine these problems more closely in 
Chapter II when we discuss “Skolem’s paradox.” 

Let us summarize. Almost all the basic logical and set-theoretic principles 
used in the day-to-day work of the mathematician are contained in the first- 
order languages and, in particular, in L1Set. Hence, those languages will be the 
subject of study in the first and third parts of the book. But concrete oriented 
languages can be formed in other ways, with various degrees of deviation from 
the rules of £;. In addition to L2Real, examples of such languages examined 
in Chapter II include SELF (Smullyan’s language for self-description) and SAr, 
which is a language of arithmetic convenient for proving Tarski’s theorem on 
the undefinability of truth. 


Digression: Syntax 


1. The most important feature that most artificial languages have in common 
is the ability to encompass a rich spectrum of modes of expression starting 
with a small finite number of generating principles. 
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In each concrete case the choice of these principles (including the alphabet 
and syntax) is based on a compromise between two extremes. Economical use of 
modes of expression leads to unified notation and simplified mechanical analy- 
sis of the text. But then the texts become much longer and farther removed 
from natural language texts. Enriching the modes of expression brings the 
artificial texts closer to the natural language texts, but complicates the syntax 
and the formal analysis. (Compare machine languages with such programming 
languages as Algol, Fortran, Cobol, etc.) 

We now give several examples based on our material. 


2. Dialects of 24 


(a) Without changing the logic in £1, it is possible to discard parentheses and 
either of the two quantifiers from the alphabet, and to replace all the con- 
nectives by one, namely | (conjunction of negations). (In addition, con- 
stants could be declared to be functions of degree 0, and functions could 
be interpreted as relations.) 


This is accomplished by the following change in the definitions. If t1,..., t, 
are terms, f is an operation of degree r, and p is a relation of degree r, then 
fti...ty is a term, and pt, ...t, is an atomic formula. If P and @ are formulas, 
then | PQ and VxP are formulas. The content of | PQ is “not P and not Q” 
so that we have the following expressions in this dialect: 


-(P): | PP, 
(P)A(Q): IL PP | QQ, 
(P)V(Q): IL PQ | PQ. 


Clearly, economizing on parentheses and connectives leads to much repetition 
of the same formula. Nevertheless, it may become simpler to prove theorems 
about such a language because of the shorter list of syntactic norms. 


(b) Bourbaki’s language of set theory has an alphabet consisting of the signs 
, T, V, 7, =, € and the letters. Expressions in this language are not 
simply sequences of signs in the alphabet, but sequences in which certain 
elements are paired together by superlinear connectives. For example: 


es | 

[| 

rV7E0A €OA’. 
The main difference between Bourbaki’s language and LSet is the use of the 
“Hilbert choice symbol.” If, for example, € xy is the formula “x is an element 
of y,” then 


[| 


Te y 


is a term meaning “some element of the set y.” 
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Bourbaki’s language is not very convenient and is not widely used. It became 
known in the popular literature thanks to an example of a very long abbreviated 
notation for the term “one,” which the authors imprudently introduced: 


r2( (Gu)(3U)(u = (U, {2}, Z)AU C{} x ZA (Vk)((x € {2}) 
y)((x,y) € U)) A (Va) (Vy) (Vy )(((a,y € U A (a,y') € U) 
= (y=y')) A (w((y € Z) > Gx)a((x,y) €U)))). 


= 


It would take several tens of thousands of symbols to write out this term 
completely; this seems a little too much for “one.” 


(c) A way to greatly extend the expressive possibilities of almost any language 
in L; is to allow “class terms” of the type {z|P(x)}, meaning “the class of 
all objects x having the property P.” This idea was used by Morse in his 
language of set theory and by Smullyan in his language of arithmetic; see 
§10 of Chapter II. 


3. General remarks. Most natural and artificial languages are characteristically 
discrete and linear (one-dimensional). On the one hand, our perception of 
the external world is not felt by us to be either discrete or linear, although 
these characteristics are observed on the level of physiological mechanisms 
(coding by impulses in the nervous system). On the other hand, the lan- 
guages in which we communicate tend to transmit information in a sequence 
of distinguishable elementary signs. The main reason for this is probably 
the much greater (theoretically unlimited) uniqueness and reproducibility 
of information than is possible with other methods of conveyance. Compare 
with the well-known advantages of digital over analog computers. 


The human brain clearly uses both principles. The perception of images as 
a whole, along with emotions, are more closely connected with nonlinear and 
nondiscrete processes—perhaps of a wave nature. It is interesting to examine 
from this point of view the nonlinear fragments in various languages. 

In mathematics this includes, first of all, the use of drawings. But this use 
does not lend itself to formal description, with the exception of the separate 
and formalized theory of graphs. Graphs are especially popular objects, because 
they are as close as possible both to their visual image as a whole and to their 
description using all the rules of set theory. Every time we are able to connect a 
problem with a graph, it becomes much simpler to discuss it, and large sections 
of verbal description are replaced by manipulation with pictures. 


A less well-known class of examples is the commutative diagrams and spec- 
tral sequences of homological algebra. A typical example is the “snake lemma.” 
Here is its precise formulation. 

Suppose we are given a commutative diagram of abelian groups and 
homomorphisms between them (in the box below), in which the rows are exact 
sequences: 
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0 —> Kerf — > Kerg — > Kertd -—- -—- - - 


— — —> Coker f — Coker g — > Coker h ——> 0 


Then the kernels and cokernels of the “vertical” homomorphisms f, g, h form 
a six-term exact sequence, as shown in the drawing, and the entire diagram of 
solid arrows is commutative. The “snake” morphism Ker h — Coker f, which 
is denoted by the dotted arrow, is the basic object constructed in the lemma. 

Of course, it is easy to describe the snake diagram sequentially in a suitable, 
more or less formal, linear language. However, such a procedure requires an 
artificial and not uniquely determined breaking up of a clearly two-dimensional 
picture (as in scanning a television image). Moreover, without having the overall 
image in mind, it becomes harder to recognize the analogous situation in other 
contexts and to bring the information together into a single block. 

The beginnings of homological algebra saw the enthusiastic recognition of 
useful classes of diagrams. At first this interest was even exaggerated; see the 
editor’s appendix to the Russian translation of Homological Algebra by Cartan 
and Eilenberg. 

There is one striking example of an entire book with an intentional two- 
dimensional (block) structure: C. H. Lindsey and S. G. van der Meulen, Informal 
Introduction to Algol 68 (North-Holland, Amsterdam, 1971). It consists of eight 
chapters, each of which is divided into seven sections (eight of the 56 sections 
are empty, to make the system work!). Let (i, 7) be the name of the jth section 
of the ith chapter; then the book can be studied either “row by row” or “column 
by column” in the (i, 7) matrix, depending on the reader’s intentions. 

As with all great undertakings, this is the fruit of an attempt to solve what 
is in all likelihood an insoluble problem, since, as the authors remark, Algol 68 
‘4s quite impossible to describe ... until it has been described.” 


II 
Truth and Deducibility 


1 Unique Reading Lemma 


The basic content of this section is Lemma 1.4 and Definitions 1.5 and 1.6. The 
lemma guarantees that the terms and formulas of any language in £1 can be 
deciphered in a unique way, and it serves as a basis for most inductive argu- 
ments. (The reader may take the lemma on faith for the time being, provided 
that he was able independently to verify the last formula in 3.7 of Chapter I. 
However, the proof of the lemma will be needed in (§4 of Chapter VII.) It is 
important to remember that the theory of any formal language begins by check- 
ing that the syntactic rules are free of ambiguity. 

We begin with the standard combinatoric definitions, in order to fix the 
terminology. 


1.1. Let A be a set. By a sequence of length n of elements of A we mean a 
mapping from the set {1, ...,} to A. The image of 7 is called the ith term of 
the sequence. Corresponding to n = 0 we have the empty sequence. Sequences 
of length 1 will sometimes be identified with elements of A. 

A sequence of length n can also be written in the form aj,...,@j,---,@n, 
where a; is its ith term. The number 7 is called the index of the term a,. If 
P =(a1,..-,@n) and Q = (b1,...,bm) are two sequences, their concatenation 
PQ is the sequence (a1,...,@n,01,---,bm) of length m + n whose ith term is a; 
fori <nand };_, forn+1<i<n+m. We similarly define the concatenation 
of a finite sequence of sequences. 

An occurrence of the sequence Q in P is any representation of P as a 
concatenation P,Q P2. Substituting a sequence R in place of a given occurrence 
of Q in P amounts to constructing the sequence P; RP». 

Let II*,II~ be two disjoint subsets of (1,...,n). A map c: IIt — H- 
is called a parentheses bijection if it is bijective and satisfies the following 
conditions: 


(a) c(i) >i for alli € II’; 
(b) for every i and j,7 € [#, c(2)] if and only if c(j) € [i, c(i)]. 
Yu. I. Manin, A Course in Mathematical Logic for Mathematicians, Second Edition, 19 
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1.2. Lemma. Given II* and II-, if a parentheses bijection exists, then it is 
unique. 


This lemma will be applied to expressions in languages in £,: II* will 
consist of the indices of the places in the expression at which “(” occurs, 7 
will consist of the indices of the places at which “)” occurs, and the map c 
correlates to each left parenthesis the corresponding right parenthesis. 


PROOF OF THE LEMMA. Let the function €: {1,...,n}— {0,+1} take the value 
1 on It , -1 on I, and 0 everywhere else. We claim that for every i € IIT, 
for any parentheses bijection c : It — IT~, and for any k,1 < k < c(i) — i, we 
have the relations 


c(t) c(t)—k 
e(j) =0, ye) So. 
j=l j=l 

The lemma follows immediately from these relations, since we obtain the 
following recipe for determining c from Ht and II~; c(i) is the least 1 > i for 
which 7) _; e(j) = 0. 

The first relation holds because the elements of I+ and I~ that appear in 
the interval [i, c(i)] do so in pairs (j, c(j)), and e(7) + e(c(j))= 0. 

To prove the second relation, suppose that for some 7 and k we have 
So e(j) < 0. Since e(i) = 1, it follows that ae e(j) < 0. Hence, the 
number of elements of I~ in the interval [i + 1, c(¢) —k] is strictly greater than 
the number from II*. Let c(jo) € I~ be an element in the interval such that 
jo € [t +1, c(t) — k]. Then jo < 7, and in fact, jo < 2, since c(i) is outside the 
interval. But then only one element of the pair jo,c(jo) lies in [é, c(t)], which 
contradicts the definition of c. 


1.3. Now let A be the alphabet of a language L in £1 (see §2 of Chapter I). 
Finite sequences of elements of A are the expressions in this language. Certain 
expressions have been distinguished as formulas or terms. We recall that the 
definitions in §2 of Chapter I imply that: 


(a) Any term in L either is a constant, is a variable, or is represented in the 
form f(ti,...,t,-), where fis an operation of degree r, and t1,...,¢, are terms 
shorter in length. 

(b) Any formula in L is represented either in the form p(t1,...,¢,), where p is 
a relation of degree rand t;,...,t, are terms shorter in length, or in one of the 
seven forms 


(P)+Q, (P)>(Q), (P)V(Q), (P)A(Q), 
-=(P), Va(P), Ax(P), 


where P and Q are formulas shorter in length, and z is a variable. 

The following result is then obtained by induction on the length of the 
expression: if F is a term or a formula, then there exists a parentheses bijec- 
tion between the set II* of indices of left parentheses in E and the set I~ 
of indices of right parentheses. In fact, the new parentheses in 1.3(a) and (b) 
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have a natural bijection, while the old ones (which might be contained in the 
terms t;,...,t, or the formulas P,Q) have such a bijection by the induction 
assumption. In addition, the new parentheses never come between two paired 
old parentheses. 

We can now state the basic result of this section: 


1.4. Unique Reading Lemma. Every expression in L is either a term, or 
a formula, or neither. These alternatives, as well as all of the alternatives 
listed in 1.3(a) and (b), are mutually exclusive. Every term (resp. formula) 
can be represented in exactly one of the forms in 1.3(a) (resp.1.3(b)), and in a 
unique way. 

In addition, in the course of the proof we show that if an expression is the 
concatenation of a finite sequence of terms, then it is uniquely representable as 
such a concatenation. 

ProoF. Using induction on the length of the expression FE, we describe 
an informal algorithm for syntactic analysis, which uniquely determines which 
alternative holds. 


(a) If there are no parentheses in F, then F is either a constant term, a variable 
term, or neither a term nor a formula. 

(b) If E contains parentheses, but there is no parentheses bijection between the 
left and right parentheses, then F is neither a term nor a formula. 

(c) Suppose F contains parentheses with a parentheses bijection. Then either 
F is uniquely represented in one of the nine forms 


f(Eo) (where f is an operation), 
p(Eo) (where p is a relation), 
(Bi) = (E2), (Ei) > (Ex), (Ei) V (E2), (Bi) A (22), 
(£3), Wa(E3), Ax(Es), 


or else F is neither a term nor a formula. Here the pairs of parentheses we have 
written out are connected by the unique parentheses bijection that is assumed 
to exist in & this is what ensures uniqueness. In fact, we obtain the form f (Fo) 
if and only if the first element of the expression is a function, the second element 
is “(”, and the last element is the “)” that corresponds under the bijection: and 
similarly for the other forms. 

We have thereby reduced the problem to the syntactic analysis of the 
expressions Eo, FE), E2, £3, which are shorter in length. This almost completes 
our description of the algorithm, since what remains to be determined about 
EF, E2, Ez; is whether they are formulas. However, for Ey we must determine 
whether this expression is a concatenation of the right number of terms, and 
we must ask whether such a representation must be unique. 

The answer to the latter question is positive. We have the following recipe 
for breaking off terms from left to right in a union of terms. 

(d) Let Eo be an expression having a parentheses bijection between its left 
and right parentheses. If Eo can be represented in the form tE), where ¢ is 
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a term, then this representation is unique. In fact, either Eg can be uniquely 
represented in one of the forms 


eK, cE, f(Ey)£p 


(where z is a variable, c is a constant, and f is an operation whose parentheses 
correspond under the unique parentheses bijection in Eo), or else Eo cannot 
be represented in the form tEj, where ¢ is a term. In the cases Eg = rEj or 
Eo = cE{, this is obviously the only way to break off a term from the left. In 
the case Ey = f (Ej) E%, the question reduces to whether Ef is a concatenation 
of degree—(f) terms. By induction on the length of Eo, we may assume that 
either EY is not such a concatenation, or else it is uniquely representable as a 
concatenation of terms. The lemma is proved. 


EXERCISE: State and prove a unique reading lemma for the “parentheses-less” dialect 
of £; described in 2(a) of “Digression: Syntax” in Chapter I. 


Here is the first inductive description of the difference between free and 
bound occurrences of a variable in terms and formulas. The correctness of the 
following definitions is ensured by Lemma 1.4. 


1.5. Definition. 


(a) Every occurrence of a variable in an atomic formula or term is free. 

(b) Every occurrence of a variable in =(P) or in (P;) * (P2) (where * is any of 
the connectives “V”, “A”, “=>”, “<”) is free (respectively bound) if and 
only if the corresponding occurrence in P, P;, or P is free (respectively 
bound). 

(c) Every occurrence of the variable x in Va(P) and Ja(P) is bound. The 
occurrences of other variables in Va(P) and da(P) are the same as the 
corresponding occurrences in P. 


Suppose the quantifier V (or 5) occurs in the formula P. It follows from the 
definitions that it must be followed in P by a variable and a left parenthesis. 
The expression that begins with this variable and ends with the corresponding 
right parenthesis is called the scope of the given (occurrence of the) quanti- 
fier. 


1.6. Definition. Suppose we are given a formula P, a free occurrence of the 
variable xin P, and a term t. We say that t is free for the given occurrence of 
x in P if the occurrence does not lie in the scope of any quantifier of the form 
dy or Vy, where y is a variable occurring in ¢. 


In other words, if t is substituted in place of the given occurrence of z, all 
free occurrences of variables in ¢ remain free in P. 

We usually have to substitute a term for each free occurrence of a given 
variable. It is important to note that this operation takes terms into terms and 
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formulas into formulas (induction on the length). If t is free for each free occur- 
rence of xin P we simply say that ¢ is free for x in P. 


1.7. We shall start working with Definitions 1.5 and 1.6 in the next section. 
Here we shall only give some intuitive explanations. 

Definition 1.5 allows us to introduce the important class of closed formulas. 
By definition, this consists of formulas without free variables. (They are also 
called sentences.) The intuitive meaning of the concept of a closed formula is as 
follows. A closed formula corresponds to an assertion that is completely deter- 
mined (in particular, regarding truth or falsity); indeterminate objects of the 
theory are mentioned only in the context “all objects x satisfy the condition ...” 
or “there exists an object y with the property ... .” Conversely, a formula that 
is not closed, such as a € y or da(x € y), may be true or false depending on 
what sets are being designated by the names z and y (for the first) or by the 
name y (for the second). Here truth or falsity is understood to mean for a fixed 
interpretation of the language, as will be explained in §2. 

In particular, Definition 1.6 gives the rules of hygiene for changing notation. 
If we want to call an indeterminate object x by another name y in a given 
formula, we must be sure that x does not appear in the parts of the formula 
where this name y is already being used to denote an arbitrary indeterminate 
object (after a quantifier). In other words, y must be free for z. Moreover, if we 
want to say that x is obtained from certain operations on other indeterminate 
objects (« = a term containing yi,...,Yn), then the variables y1,..., Yn must 
not be bound. 

There is a close parallel to these rules in the language of analysis: in- 
stead of f," f(y) dy we may confidently write [;" f(z) dz but we must not write 
JY f(x) dex; the variable y is bound, in the scope of f f(y) dy. 


2 Interpretation: Truth, Definability 


2.1. Suppose we are given a language L in £; and a set (or class) M. To give 
an interpretation of ZL in M means to tell how a formula in LZ can be given a 
meaning as a statement about the elements of M. 

More precisely, an interpretation ¢ of the language L in M consists of a 
collection of mappings that correlate terms and formulas of the language to 
elements of M and structures over M (in the sense of Bourbaki). These 
mappings are divided into primary mappings, which actually determine the 
interpretation, and secondary mappings, which are constructed in a natural 
and unique way from the primary mappings. We shall use the term interpreta- 
tion to refer to the mappings themselves, and sometimes also to the values they 
take. 

Let us proceed to the systematic definitions. We shall sometimes call the 
elements of the alphabet of L symbols. The notation ¢ for the interpretation 
will either be included when the mappings are written or omitted, depending 
on the context. 
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2.2. Primary mappings 


(a) An interpretation of the constants is a map from the set of symbols for 
constants (in the alphabet of L) to M that takes a symbol c to ¢(c) € M. 

(b) An interpretation of the operations is a map from the set of symbols for 
operations (in the alphabet of L) that takes a symbol f of degree r to a 
function ¢(f) on M x ++. x M = M” with values in M. 

(c) An interpretation of the relations is a map from the set of symbols for 
relations (in the alphabet of L) that takes a symbol p of degree r to a 
subset ¢(p) Cc M”. 


Secondary mappings. Intuitively, we would like to interpret variables as names 
for the “generic element” of the set M, which can be given specific values 
in M. We would like to interpret the term f(#1,...,2,) as a function $(f) of r 
arguments that run through values in M, and so on. 

In order to give a precise definition, we introduce the interpretation 
class M: 

M = the set of all maps to M from the set of symbols for variables in 

the alphabet of L. 
Thus, every point € € M correlates to any variable z a value ¢(x)(€) € M, 
which we shall usually denote simply by x‘. This allows us to consider variables 
as functions on M with values in M. More generally: 


2.3. The interpretation of terms correlates to each term ta function ¢(t) on M 
with values in M. This correspondence is defined inductively by the following 
compatibilities: 


(a) If cis a constant, then (c) is the constant function whose value is defined 
by the primary mapping. 

(b) If wis a variable, then $(x) is $(x)(€) as a function of €. 

(c) Ift = f(ti,...,t,), then for all € € M, 


H(E)(S) = O(F)(P(t1)(E), - «+ P(tr)(E)), 


where the ¢(t;)(€) are defined by the induction assumption, and ¢(f) : 
M" — M is given by the primary mapping. Instead of ¢(t)(€) we shall 
sometimes write simply #6. 


2.4. Interpretation of atomic formulas. An interpretation ¢ assigns to every 
formula P in L a truth function |P|4. This is a function on the interpretation 
class M that takes only the values 0 (“false”) and 1 (“true”). It is defined for 
atomic formulas as follows: 


1, if (4,...,t8) € o(p), 
0, otherwise. 


Ip(t1,--- tr) |g(€) = 


Intuitively, a statement p about the names tj,...,¢, for objects in M becomes 
true if the objects named by ¢,,...,t, satisfy the relation named by p. 
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2.5. Interpretation of formulas. The truth function for nonatomic formulas is 
defined inductively by means of the following relations (for brevity, we have 
omitted parentheses and explicit mention of ¢ and §): 


IP = Q| = |PIlQ| + —|P)G —|Q)) : 
P &Q is true when either P and Q are both true or P and Q are both false. 
IP > Q| =1—|P|+|PIQ|: 

P = Q is false only when P is true and Q is false. 

|P V Q| = max(|P],|Q]) : 
P\ Q is false only when P and Q are both false. 

|P A Q| = min(|PI, |Q]) : 
PA Q is true only when P and Q are both true. 

|=>P| =1-—|P|: 


-P is false only when P is true. 

Finally, we must describe what happens when quantifiers are introduced. 
Suppose that € € M and zis a variable. By a variation of € along x we mean 
any point €/ € M for which y = y whenever y is a variable different from «x. 
Then 


IVeP|(6) = min |P|(), 


[SxP|(é) = max|PI(e’), 


where €’ runs through all variations of € along z. 

A formula P is called ¢-true if |P|g(€) =1 for all € € M. The interpretation 
@ (or M) is called a model for a set of formulas € if all the elements of € are 
o-true. 


2.6. EXAMPLE: STANDARD INTERPRETATION OF L;Ar. This is the interpretation 
in the set N of nonnegative integers, in which 0,1 are interpreted as 0, 1, 
respectively, and +, -, = are interpreted as addition, multiplication, and equal- 
ity, respectively. 


2.7. EXAMPLE: STANDARD INTERPRETATION OF L,Set. This is the interpreta- 
tion in the von Neumann universe V, in which @ is interpreted as the empty 
set, € is interpreted as the relation “is an element in,” and = is interpreted as 
equality. 

All of the examples of translations in Chapter I were based on these stan- 
dard interpretations. The relationship between those examples and the above 
definitions is as follows. Let Il(z,y,z) be a statement in argot about the 
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indeterminate sets x,y,z in V; and let P(x,y,z) be a translation of II into 
the language L,Set. Then for any point € interpreting x,y,z as the names of 
sets «6, y§, z§ in the von Neumann universe, we have: 


II(a*, y$, 2*) is true = |P(a,y, z)|(€) = 1. 


Thus, every formula expresses, or defines, a property of objects in the interpre- 
tation set: 


2.8. Definition. A set S Cc M",r > 1, is called ¢-definable (by the formula P 
in L with the interpretation ¢) if there exist variables 21,...,2, such that 


|Pls(€) =14 (sh,...,28) 8 


for all € in M. 
One of the most important problems concerning formal languages is to 
understand the structure of the sets of 


g-true formulas in L; 


o-definable sets in U M". 


rol 


2.9. EXAMPLE. The sets definable by means of L;Ar with the standard inter- 
pretation constitute the smallest class of sets in U,, N” that 


(a) contains all sets of the form 
{(ky,...,kp)|F(ki,..., Kp) =O} CN’, 


where F runs through all polynomials with integral coefficients; 

(b) is closed relative to finite intersections, unions, and complements (in the 
appropriate N”); 

(c) is closed relative to the projections pr; : N" > N"7!: 


Pr, (ki,..., kr) = (ki,..-, ki—1, ina,---, he). 


In fact, sets of type (a) are defined by atomic formulas of the form t}’ = #3, 
where t? is a term corresponding to the sum of the monomials in F' with posi- 
tive coefficients, and tf’ corresponds to the sum of the monomials with negative 
coefficients. Further, if $1,S2 C N” are definable by formulas P,, Py (with the 
same variables), then S$; Sz is definable by P; A P2,5, U S2 is definable by 
P, V Pz, and N” \ S; is definable by —P;. Finally, the set pr;(S ) is defin- 
able by the formula Jx;(P,). The connectives > and © and the quantifier V 
give nothing new, since without changing the set being defined, we may replace 
them by combinations of the logical operations already discussed: Va may 
be replaced by =4a-, and so on. 

This first description of arithmetical sets, i.e., LyAr-definable sets, will be 
greatly amplified in the second and third parts of the book. At this point it 
is not immediately clear how to develop the subtler properties of definability, 
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such as the definability of the set of prime numbers in N (see Example 3.14 
in Chapter I), the definability of the set of partial fractions in the continued 
fraction expansion of ¥/2, or the definability of the set of pairs 


{(i, ith digit in the decimal expansion of 7)} C N°. 


However, as we shall see in §11 and in Chapter VII, the “Gédel numbers of the 
true formulas of arithmetic” form a much more complicated set, and this set is 
not definable. 

We now give several simple technical results. 


2.10. Proposition. Let P be a formula in L, @ an interpretation in M, and 
€,€’ € M. Suppose that «§ coincides with x§ for all variables x occurring freely 
in P. Then |Plo(€) = |Plo(é’). 


2.11. Corollary. In any interpretation the closed formulas P have well-defined 
truth values: |P|y(&') does not depend on (&). 


PROOF. 

(a) Let t be a term, and suppose that for any variable x in t we have «§ = 2. 
Then Lemma 1.4 and induction on the length of t give t§ = ts. 

(b) Assertion 2.10 holds for atomic formulas P of the form p(t),...,t,). 
In fact, 


east 
|P|(€) = fe £(t3, st) & P(P), 


0, otherwise, 


and similarly for |P|(é’). But if € and €’ coincide on all the variables in P (all 
of which occur freely), then a fortiori they coincide on all the variables in ¢;, 


and by part (a), we have i = tS i =1,...,r. Therefore |P|(€) = |P|(&’). 


(c) We now use induction on the total number of connectives and quantifiers 
in P. If P has the form =Q or Q) * Qe, then 2.10 for P follows trivially from 2.10 
for Q, Qi, Q2. Now suppose that P has the form Vx(Q), and that 2.10 holds for 
Q. (The case 4x(Q) can be treated analogously or can be reduced to the case 
Va by replacing Jz by ~Vx 7.) By definition, we have 


1, if|Q|(7) = 1 for variations 7 of € along x, 
0, otherwise; 


IV rQl(g) = 


WV 2Q\(e’) = i if|Q|(n") = | for variations 7’ of £’ along z, 

0, otherwise. 
On the right we may let 7 and 1 vary in addition on all variables that do not 
occur freely in Q. The assertions after the word “if” remain true or false in 
this wider range of values if they were true or false before, by the induction 
hypothesis on Q. But then 7 and 7’ run through the same values, because € 
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and €’ differ only on variables that do not occur freely in Q, and on x. The 
proposition is proved. 


The following almost obvious fact is the basis for many phenomena that 
attest to the inadequacy of formal languages for completely describing intuitive 
concepts (see “Skolem’s paradox” below): 


2.12. Proposition. The cardinality of the class of ¢-definable sets does not 
exceed 
card(alphabet of L) + No. 


Here and below, by “card(alphabet of L)” we mean the cardinality of the al- 
phabet of L without the set of variables. 


ProoF. If the language has < No variables, then there are at most 
card(alphabet of ZL) + No formulas. 


If, on the other hand, it has an uncountable set of variables, then we note that 
every definable set can be defined by a formula whose variables belong to a 
fixed countable subset of the variables that is chosen once and for all. 


2.13. Corollary. If M is infinite and card(alphabet of L) < 2°°¢™, then 
“almost all” sets are undefinable. 


Thus, the only way to define all subsets of M is to include a tremendous 
number of names in the language. For languages that are to describe actual 
mathematical reasoning this is an unrealistic program. Essentially, any finitely 
describable collection of modes of expression allows us to define only a countable 
number of sets. However, it is often technically useful to include in the alphabet, 
for example, names for all the elements of M. 

In the following sections we proceed to study systematically sets of true 
formulas. 


3 Syntactic Properties of Truth 


Let L be a language in £1, let ¢ be an interpretation of L, and let TyL be the 
set of @-true formulas. In this section we list some properties of TyL that reflect 
the logic inherent in languages of £1, regardless of the specific nature of the 
interpretation @. 


3.1. The set TyL is complete. By definition, this means that for any closed for- 
mula P, either P or — P lies in TyL. This property follows from Corollary 2.11 
above. 


3.2. The set TyL does not contain a contradiction, that is, there is no formula 
P for which P and —P both lie in TyL. In fact, TyL = {P| |P|¢ = 1}, while 
|-Plp =1—|Plo. 
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3.3 The set TyL is closed under the rules of deduction MP (modus ponens) and 
Gen (generalization). By definition, this means that if P and P > Q lie in TgL, 
then Q also lies in T,L, and that if P lies in Ty L, then VaP lies in TyL for any 
variable x. The verification is immediate: if |P|, = 1 and |P > Q|, = 1, then 
we must have |Q|s = 1; if |P|s(€) = 1 for all €, then also |V~P|4(€) = 1. The 
formula Q is called a direct consequence of the formulas P and P > Q using 
the rule of deduction MP. The formula VP is called a direct consequence of the 
formula P using the rule of deduction Gen. 

The intuitive meaning of these rules of deduction is as follows. The rule 
MP corresponds to the following type of argument: “If P is true, and if the 
truth of P implies the truth of Q, then Q is true.” Thus, one might say that 
the semantics of the expression “if ... then” in natural languages is divided 
between the semantics of the connective = and the semantics of the rule of 
deduction MP in languages of £1. Neglecting this point of view often leads to 
confusion when one attempts to explain the rules for assigning truth values to 
the formula P > Q. 

The rule Gen corresponds to the practice in mathematics of writing “identi- 
ties” or universally true assertions. When we write (a+b)? = a?+2ab+0? or “in 
aright triangle the square of the hypotenuse is equal to the sum of the squares 
of the other two sides,” the quantifiers Va Vb and V triangles are omitted. 
Putting the quantifiers back in does not change the truth values, and has the 
advantage of freeing the notation for later use. 


3.4. The set TyL contains all tautologies. To define what a tautology is, we first 
introduce the notion of a logical polynomial over a set of formulas €. This is an 
element in the minimal set of formulas that contains € and is closed with respect 
to constructing formulas from shorter formulas using logical connectives. 

A sequence of formulas P,,...,P, and representations of each P;, either in 
the form Q, where @ € €, or in the form —Q or Q, * Q2, where Q, Qi, Q2 
lie in {P;,...,Pi-1}, is called a representation of P, as a logical polynomial 
over €. The representation of P, is not necessarily unique: for example, if 
E={P,Q,P => Q}, then P = Q has two representations. 

Let || : € — {0,1} be any map. If we are given a representation r of the 
formula P,, as a logical polynomial over €, then we can use the formulas in 2.5 
to determine |P,,|, recursively. 

A formula P is called a tautology if there exist a set of formulas E and a 
representation r of P as a logical polynomial over E such that |P|, = 1 for all 
maps || : € + {0,1}. The property of being a tautology is effectively decidable, 
since; by syntactically analyzing P we can enumerate all representations of P 
as a logical polynomial. All tautologies obviously belong to TyL. 

Here are our first examples of tautologies: 


AO. P= P; 

Al. P>(Q=>P); 

A2. (P= (Q=> R)) > ((P > Q) = (P= R)); 
A3. (4Q = >P) = ((-Q = P) = Q); 
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Bl. AP >P,P>-—-P; 
B2. aP > (P= Q). 


Here P, Q, and Rare arbitrary formulas in L; the form in which these tautologies 
are written makes it clear what representation as a logical polynomial over 
{P,Q, R} is intended. 

Thus, tautologies are formulas that are true regardless of the truth or falsity 
of the component parts (if the notion of component is suitably chosen). Bl is 
the law of the excluded middle: a double negation is equivalent to the original 
assertion. B2 is the mechanism by which a contradiction in a set of formulas € 
in L leads to the deducibility of any formula, and thereby destroys the entire 
system. (See Proposition 4.2 below.) 


EXAMPLE OF HOW A TAUTOLOGY IS VERIFIED. We give three versions of how 
to verify that the simple formula Al is a tautology. 
Version (a). By the formulas in 2.5, we have 


IP>(Q=> P)|=1—-|P|+|P| |Q => P| 
=1-—|P|+|P|2—|Q| + |P|lQ) =1, 


since |P|? = |P|. 


Version (b). We tabulate |P > (Q => P)| as a function of |P| and |Q|: 


IP] 1Ql |Q> Pi |P>(Q=> P)I 
0 0 T I 
0 1 0 1 
1 0 1 1 
1 1 1 1 


This is an example of a “truth table.” 

Version (c). The basic property of the connective = is that P > Q is false 
only if P is true and Q is false. If P > (Q => P) were false, then P would be 
true and Q => P would be false; then, in turn, Q would be true and P would 
be false, a contradiction. 

The reader would do well to verify that the more complicated axioms, for 
example A2, are tautologies, and to decide which of the three versions he prefers. 


3.5. The set TyL contains the “logical quantifier axioms,” that is, the 
formulas 


(a) Va(P > Q) > (P = VQ), if all the occurrences of x in P are bound. 

(b) Va 7P = AAP. 

(c) VaP(a) = P(t), if t is free for x in P (axiom of specialization). Here we use 
the notation P(t) for the result of substituting t for each free occurrence of 
x in P. In all other respects P and Q are arbitrary formulas. 
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In 3.7 we verify that the formulas in 3.5 are ¢-true. The intuitive meaning 
of these formulas is more or less clear. For example, the axiom of specialization 
means that if P(a) is true for all x, then P(t) is also true, where t is the name 
of any object. The condition that t must be free for x is the rule of hygiene for 
changing notation. 

The set 


Ax L = {tautologies of L} U {quantifier axioms} 


is called the set of logical axioms in the language L. 

A set of formulas € in L will be called Gédelian if it is complete, does not 
contain a contradiction, is closed with respect to the rules of deduction MP 
and Gen, and contains all the logical axioms of L. The basic conclusion of our 
discussion is then the following: 


3.6. Proposition. The set of true formulas of L (in any interpretation) is 
Godelian. 


In §6 we prove that conversely, any Gédelian set is a set of true formulas 
in a suitable interpretation. Thus, the concept of a Géddelian set is the closest 
approximation to the concept of truth that can be attained “without regard to 
meaning.” 


3.7. Verification that axioms 3.5 are true. 


(a) Let R be the formula 3.5(a). We suppose that |R|(€) = 0 for some € € M 
and show that this leads to a contradiction. 

In fact, then |\Vz(P > Q)|(€) = 1 and |P > Va Q|(g) = 0. The second 
equation implies that |P|(€) = 1 and |Vx Q|(£) = 0. Let €’ be a variation of € 
along x for which |Q|(€’) = 0. Then |P|(£’) = |P|(€) = 1 by Proposition 2.10, 
since « does not occur freely in P. Hence, |P — Q|(€’) = 0, which contradicts 
the relation |Va(P > Q|(€) = 1. 

(b) For all € € M and for all variations €’ of € along x, we have 


[Ver P\(€) = max|P|(e’) = 1 — min |PI(é") 


532 Pl(é) = 1— min |P\(é’). 


Hence, the truth values of Vz ~P and =4aP coincide, so that Vaa P&S ada P 
is identically true. 


(c) Suppose that |Vx P(x) = P(t)|(€) = 0 for some point € € M. We show that 
this leads to a contradiction. In fact, then 


Va P(a)\(E)=1, | P)I(E) = 0. 


The first equation implies that |P(x)|(€’) = 1 for all variations €’ or € along z. 
For €’ we take the variation such that 2 = t§. If we prove that |P(t)|(€) = 
|P(a)|(€’), then we obtain the desired contradiction. 
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We prove this by induction on the total number of connectives and 
quantifiers in P. 

(c;) Let P be an atomic formula p(ti,...,t,). Letting t; denote the result 
of substituting t for each occurrence of x in t;, we successively obtain 


t&=2* (by the definition of €’), 
#§ =t§ (by induction on the length of t,), 
[P(@)|(6) = |P(ta, tn )I(E) = [Pls tn IE) = LPO I(E). 


(cg) Let P have the form =Q or Q + Qe, where +> is a connective. Since 
x does not bind t in P by assumption, the same is true for Q,Q1, and Qe, and 
the necessary induction step is automatic. 

(c3) Finally, let P have the form Jy Q or Vy Q. We shall examine the first 
case; the proof for the second case is analogous. 

Subcase 1. y = x. Then «x is bound in P; therefore, P(x) = P(t), and 
|P|(€) = |P\(€’) by Proposition 2.10. 

Subcase 2. y # x. The induction assumption has the form |Q(t)|(7) = 
|Q(x)|(m’) if 7 is any point in M and 77 is a variation of 7 along x for which 
x" = t", We must show that the following two truth values coincide (where € 
and &' are defined as above): 


1, if |Q(x)|(n’) = 1 for some variation 7’ of €’ along y, 


0, otherwise. 


[Sy Q(a)|(E.) = 


1, if|Q()|(7) = 1 for some variation 7 of € along y, 


Fy QO) = i otherwise. 
We recall that é’ is the variation of € along x for which x = ¢§. 

We first suppose that the second truth value is 1. We choose 7 € M such that 
|Q(t)|(m) = 1, and then construct the variation 7 of n along x for which x” = t”. 
Then, by the induction assumption, 1 = |Q(t)|(7) = |Q(«)|(n’). We show that 
7 is a variation of €’ along y; this will imply that the first truth value is also 1. 
In fact, 7’ was obtained by varying 7 along x, 7 was obtained by varying € along 
y, and € was obtained by varying €’ along x. Hence, 7/' is a variation of €’ along 
x and y; we must show the variation along x did not actually take place: 


a =a. 

But the left-hand side is t” by the definition of 7’; the right-hand side is t& by 
the definition of €'; and 7 was obtained by varying € along y. Since ¢ is free for 
xin P = dy Q, it follows that y does not occur in t. 

It remains to verify that if the second truth value is 0, then the first is 
also 0. The argument is almost the same. If the second truth value is 0, then 
|Q(t)|(7) = 0 for all variations 7 of € along y. For each such 7 we construct. 1 
as in the first part of the proof. As before, we verify that 7' is a variation of €' 
along y and, moreover, 7 runs through all such variations when 7 runs through 
all variations of € along y. Hence, the first truth value is also 0. 

The proposition is proved. 


3 Syntactic Properties of Truth 33 


Digression: Natural Logic 


1. Logic does not concern itself with the external world, but only with systems 
for trying to understand it. The logic of one such system—mathematics—is 
normalized to such an extent that it resembles a rigid stencil, which we can 
attempt to impose on any other system. But whether this stencil fits the system 
should not be seen as the criterion of suitability or the measure of worth of 
the system. The physicist’s descriptions do not have to form a consistent or 
coherent whole; his job is to describe nature effectively on certain levels. Natural 
languages and the spontaneous workings of the mind are even less logical. In 
general, adherence to logical principles is only a condition for effectiveness in 
certain narrowly specialized spheres of human endeavor. 

Although comparisons between the logic of predicates and the logic of nat- 

ural languages or their subsystems have no normative force, such comparisons 
may be interesting and enlightening. Here we give some selected material from 
linguistics and psychology. 
2. B. Russell, K. Dohmann, H. Reichenbach, U. Weinreich, and many 
others have studied the problem of finding parallels in natural languages for 
categories that can be formalized in languages of £1 and of cataloguing the 
methods of transmitting these categories. This leads to the grouping of words 
into so-called logico-semantic classes, instead of the traditional division into 
verbs, nouns, articles, etc. (A. V. Gladkii and I. A. Mel’éuk, Eléments de lin- 
guistique mathématique, Paris, Dunod, 1972, 86). 

For example, the words sleeps, smart, crybaby are parallel to relation symbols 
(predicates) of rank 1; the words loves, friendly, sister correspond to relations 
of rank 2. For each of them we have atomic formulas, such as “N sleeps,” “X 
is friendly to Y,” and so on. 

“All, sometimes, something” are quantifier words; while “and, or, but, if... 
then” are, of course, connectives. “The nose, le cadeau” are constants. Nouns 
are made into constants by using the definite article or its semantic equiva- 
lent. In Russian, which does not have definite articles, one must either use the 
demonstrative articles etot (this), tot (that), or make it clear from the context 
that the noun is meant as a constant. The words nos (nose), podarok (gift) are 
more like variables that stand for any object satisfying the simple predicate “is 
a nose,” “is a gift.” Incidentally, there are other possible interpretations. 

The pronoun “he” is, without doubt, a variable. The pronouns “I” and 
“vou” have much more complicated semantics, involving a correlation with who 
is speaking that does not exist in the speakerless languages of £1. Certain 
aspects of the first person pronoun are included in the semantics of algorithmic 
languages. The right type of “memory key” in a program for the IBM 360 will 
allow the program to change what is contained in any byte in the basic memory 
region. The memory guard asks “Who is there?” , and the program answers, “It 
is I.” Finally, it is even possible in languages of £; to find models for certain 
types of self-description; see 9-11 and the digression on self-reference. 

In Russian, “ili” (or) can be used not only to express the logical V, but 
also to express the exclusive “or” and even to express conjunction /, as in the 
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sentence “x? > 0 for « > 0 or for x < 0” (E. V. Paduéeva). In Latin, the func- 
tions of exclusive and inclusive “or” are expressed by two different words, aut 
and vel. “And” can sometimes express a time sequence: compare the sentences 
“Jane got married and had a baby” with “Jane had a baby and got married” 
(S. Kleene). The conjunction A can be expressed in different languages by 


juxtaposition: Chinese: ma mo—horse and donkey 
Swahili: shika kitabu usome—take a book and read 
a preposition: Russian: Petya s Masei—Peter and Masha 
a conjunction: and, i, et 
a postpositional particle: Latin: senatus populusque—the senate and the 
people 


two conjunctions: Russian: kak ... tak. 


Dohmann has catalogued the ways of expressing 16 logical polynomials in 
two variables in several languages of the world. 


3. Curious as all this material may be, it should be regarded critically; in such 
comparisons with logic, the subtleties of usage often elude us. As an example, 
let us analyze the natural semantics of “if... then.” We have already mentioned 
that in languages of £; this connective corresponds not only to “=” but also to 
the rule of deduction modus ponens. Moreover, MP more adequately represents 
the meaning of “if... then.” 

Actually, the rule that any conditional is true if its antecedent is known to 
be false has almost no parallel in natural logic. Examples of the type “if snow 
is black, then 2 x 2 = 5,” which keep cropping up in textbooks, are capable 
only of confusing the student, since no natural subsystem in our language has 
expressions with this semantics. A possible exception is certain poetic and ex- 
pressive formulas with extremely limited usage (“If she be false, O, then heaven 
mocks itself!”). Formal mathematics, in which a single contradiction destroys 
the entire system, clearly has the features of poetic hyperbole. 

Finally, in the logic of predicates there is no place at all for the modal 

aspect of the use of “if... then” in instructions of the type “if this hap- 
pens, do that.” On the other hand, this aspect can easily be expressed by the 
semantics of the connective “if... then ... else” in algorithmic languages such 
as Algol. Unless one uses techniques suggested by algorithmic languages, any 
attempt to find a model for modality in languages based on £; is doomed to 
failure (compare: A. A. Ivin, The Logic of Norms, MGU Press, 1973). 
4. We have mentioned several times that the choice of the primitive modes 
of expression in the logic of predicates does not reflect psychological reality. 
Elementary logical operations, even one-step deductions, may require a highly 
trained intellect; yet, logically complicated operations can often be performed 
as a single elementary act of thought even by a damaged brain. 


Sublieutenant Zasetsky, aged twenty-three, suffered a head injury 2 March 
1943 that penetrated the left parieto-occipital area of the cranium. The 
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injury... was further complicated, by inflammation that resulted in adhesions 
of the brain to the meninges and marked changes in the adjacent tissues. 


Professor A. R. Luria met Zasetsky at the end of May 1943, and observed 
his condition for the next 26 years. In this time Zasetsky wrote nearly 3000 
pages, describing with agonizing effort his life and illness as he struggled to 
regain his reason. His notebooks, which provided the material for Luria’s book 
The Man with a Shattered World (Basic Books, Inc., New York, 1972, translated 
by L. Solotaroff), not only show his perseverance and determination, but are 
also revealing from a psychological point of view. 


At first, the destruction of Zasetsky’s psyche was overwhelming. The pre- 
dominant disorder was asemia, the inability to connect symbols with their 
meaning. Luria describes his first meeting with Zasetsky: 


““Try reading this page,” I suggested to him. 
“What’s this?... No, I don’t know... don’t understand... what is this?... .” 


I suggested he try to do something simple with numbers, like add six and 
seven. 


“Seven ... six ... what’s it? No, I can’t ... just don’t know.” 

The ability to understand the simplest predicates was lost: “What 
season is there before winter? Before winter? After winter?... Summer’... 
Or something ! No, I can’t get it. Before spring? It’s spring now ... and ... 
and before ... I’ve already forgotten, just can’t remember.” 

Zasetsky lost the ability to interpret the syntactic devices for organizing mean- 
ing: “In the school where Dunya studied a woman worker from the factory 
came to give a report.” What did this mean to him? Who gave the report— 
Dunya or the factory worker? And where was Dunya studying? Who came 
from the factory? Where did she speak? 


This is a fairly difficult example composed by Professor Luria, but here is what 
Zasetsky himself writes: 


I also had trouble with expressions like: “Is an elephant bigger than a fly?” 
and “Is a fly bigger than an elephant?” All I could figure out was that a 
fly is small and an elephant is big, but I didn’t understand the words bigger 
and smaller. The main problem was I couldn’t understand which word they 
referred to. 


What attracts our attention is the complexity of Zasetsky’s metalinguistic 
text describing his linguistic difficulties. The subtlety of the analysis seems 
incompatible with the crude errors being analyzed. This could be explained by 
the retrospective nature of the analysis, but the following even more complicated 
description was written concurrently with the experience of the mental defect 
being described: 
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Sometimes I’ll try to make sense out of those simple questions about the 
elephant and the fly, decide which is right or wrong. I know that when you 
rearrange the words, the meaning changes. At first I didn’t think it did, it 
didn’t seem to make any difference whether or not you rearranged the words. 
But after I thought about it a while I noticed that the sense of the four words 
(elephant, fly, smaller, larger) did change when the words were in a different 
order. But my brain, my memory, can’t figure out right away what the word 
smaller (or larger) refers to. So I always have to think about them for a while 
... 80 sometimes ridiculous expressions like “a fly is bigger than an elephant” 
seem right to me, and I have to think about it a while longer. 


We can also see how complicated mental abilities were preserved while 
“simple” ones were lost from examples of Zasetsky’s creative imagination, which 
resemble literary-psychological studies: 


Say I’m a doctor examining a patient who is seriously ill. I’m terribly worried 
about him, grieve for him with all my heart. (After all, he’s human too, and 
helpless. I might become ill and also need help. But right now it’s him I’m 
worried about—I’m the sort of person who can’t help caring.) But say ’'m 
another kind of doctor—someone who is bored to death with patients and 
their complaints. I don’t know why I took up medicine in the first place, 
because I don’t really want to work and help anyone. Ill do it if there’s 
something in it for me, but what do I care if a patient dies? It’s not the first 
time people have died, and it won’t be the last. 


All of this shows that there is no basis whatsoever for Rosser’s opinion that 
“once the proof is discovered, and stated in symbolic logic, it can be checked by 
a moron.” The human mind is not at all well suited for analyzing formal texts. 


4 Deducibility 


4.1. Definition. A deduction of a formula P from a set of formulas € (in a 


language L in £j) is a finite sequence of formulas P;,...,P, = P with the 
property that for each i = 1,...,n at least one of the following alternatives 
holds: 

(a) P,e &; 


(b) 47 <i such that P; is a direct consequence of P; using Gen; 
(c) dj, k <i such that P; is a direct consequence of P; and P, using MP. 


We shall write € + P to abbreviate “there exists a deduction of P from &.” 
A deduction of P, together with a precise indication for each 7 < n of which of 
the alternatives (a), (b), (c) and which indices j in case (b) or j,k in case (c) 
are used to obtain P,, is called a description of a deduction. A single deduction 
may have several descriptions. 

We usually consider deductions from sets € that contain Ax L, the logical 
axioms of L. The other elements of € may be formulas of L that are “guessed” 
to be true in the standard interpretation; these are called special axioms of L. 
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(Examples will be given later in 4.6-4.9.) Such deductions may be considered 
the formal equivalents of mathematical proofs (of a formula P = P,, from the 
hypotheses €). This identification is justified for the following reasons: 


(a) As shown in 3.3, if € C TyL for some interpretation ¢, and if EF P, then 
P € TL; only true formulas can be deduced from true formulas. 

(b) A large amount of experimental work has been done on formalizing mathe- 
matical proofs, that is, replacing them by deductions in suitable languages 
of £1, especially L Set. This work has shown that for large segments 
of mathematics, including the foundations of the theory of integers and 
real numbers, set theory, and so on, proofs can successfully be formal- 
ized as deductions within the framework of £;. There is much material 
on this theme in the literature on mathematical logic; see, in particular, 
Mendelson’s book. 

(c) Gédel’s completeness theorem for the logical modes of expression in £1 
(see §6) shows that any formula that is not deducible from € must be false 
in some model (interpretation) of €. 


For further discussion, see “Digression: Proof.” 

We occasionally consider deductions from another type of sets €. 
For example, we might remove from € certain logical axioms, such as the “law 
of the excluded middle” (B, in Section 3.4), in order to investigate formally 
intuitionistic principles. Or we might add to € a formula that we think is false 
in order to deduce a contradiction from €; this is the so-called “proof by con- 
tradiction.” 

We now prove some formal aspects of contradiction. 


4.2. Proposition. Suppose that E contains all tautologies of type B.2 in Sub- 
section 3.4. Then the following two properties of E are equivalent: 


(a) There exists a formula P such that E+ P andE+ =P. 
(b) EF Q for any formula Q. 


A set E with these properties is called inconsistent. 


PROOF. (b) =(a) is obvious. Conversely, suppose € + P and E+ =P. We first 
add the formula ~P — (P — Q), which is assumed to lie in €, to the descrip- 
tions of the two deductions. Then, applying MP twice (to this formula and —P; 
then to P = Q and P), we obtain a description of a deduction E+ Q. 


4.3. A large part of the theorems of logic consists in proving assertions of the 
type “E+ P” or “t is not true that € + P” for various languages L, sets €, and 
(classes of) formulas P. 

A result of the form € - P may be proved by presenting a description of a 
deduction of P from €. However, even in slightly complicated cases, this proce- 
dure becomes so long that it is replaced by more or less complete instructions 
on how to compose such a description. Finally, “€ / P” may be proved without 
presenting even an incomplete description of a deduction of P from €. In this 
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case we “are not proving P, but are proving that a proof of P exists”; see the 
example in §8 concerning language extensions. 

In rare cases a result of the form “it is not true that € / P” can be proved 
by a purely syntactic argument. But usually such a result is obtained by con- 
structing a model, i.e., an interpretation, in which € is true and P is false; see 
the discussion of the continuum problem in Chapters III-IV. If it is true neither 
that € + P nor that € + aP, we say that P is independent of €. 

We now give two useful elementary results concerning deductions. It is 
clear that compared with usual proofs, deductions are made up of very minor 
details. The mathematician, as if wearing seven-league boots, covers entire fields 
of formal deductions in one step. 


4.4. Lemma. Suppose that E contains all tautologies. IfE +t P andE' Q, then 
EFPPAQ. 


ProoF. If Pi,..., Pm and Qj,...,Qn are deductions of P and Q, respectively, 
then 


P,,..., Pm; Q1,---, Qn, P > (Q > (PAQ)),Q > (PAQ),PAQ 


is a deduction of P A Q. The third formula from the end is a tautology; the 
second formula from the end is a direct consequence of this tautology and 
Pm = P using MP; and the last formula is a direct consequence of the second 
to last and Q@, = Q using MP. 


4.5. Deduction Lemma. Suppose that E > Ax L and P is a closed formula. 
IfEU{P} FQ, thenE' P>Q. 


ProoF. Let Qi,...,Qn = Q be a deduction of Q from € U {P}. We show by 
induction on n that there exists a deduction of P > Q from €. 


(a) n = 1. Then either Q € €, or else Q = P. In the first case P > Q is 
deduced from Q and the tautology Q => (P => Q) using MP. In the second 
case P => P is a tautology. 

(b) n > 2. We assume that the lemma holds for deductions of length <n —1. 
Then €+ P => Q;, for all i < n—1. Further, we have the following possi- 
bilities for Qn = Q: (b1) Q € E; (b2) Q = P; (bs) Q is deduced from Q; and 
Q; = (Qi > Q) using MP; and (b,4) Q has the form Vx Q; for 7 <n—-1. 
The first two cases are handled in exactly the same way as for n = 1. 


In case (bs), P => Q can be deduced from € in the following way: 


ik 
2 


(1) 
(2) 
(3) 
(4) 
(5) 


eduction of P > Q_ (induction assumption); 

eduction of P > (Q; > Q) (induction assumption); 

(P= (01 Q) + (PQ) (PQ) (tantly) 
(P > Qi) => (P= Q) (from (2) and (3) using MP); 
P=Q_ (from (1) and (4) using MP). 


d 
d 


4 
5 
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From now on, arguments of this sort will be presented more briefly, with explicit 
mention of only the last steps of the induction (here (3), (4), and (5)). 

Finally, in case (b4), we obtain a deduction of P > Vx Q; from € if we add 
the following formulas to the deduction of P > Q, from € (which exists by the 
induction assumption): 


Va(P > Q;) (Gen) 
Va(P => Q;) => (P = VrQ;) (logical quantifier axiom, since P is closed) 
P=V«Q; (MP applied to the two preceding formulas). 


The lemma is proved. 


We record for future reference that in the parts of deductions constructed 
in Lemmas 4.4 and 4.5, only tautologies of the types AO, Al, and A2 in 
Section 3.4 were used. 

We now give some basic examples of special axioms. 


Axioms of equality 


Let DL be a language in £; whose alphabet includes a relation = of rank two. 
We shall write t),t2 instead of = (t1,t2). If P is a formula, x is a variable, and 
t is a term, we let P(x,t) denote the result of substituting t in P in place of 
any or all of the free occurrences of x in P for which t is free. 


4.6. Proposition. 
(a) The formulas 


t=; ty =toS>t2=t1; ty =toAtg=t3 >t; =ts; 
x=t=> (P(a,2) > P(z,t)) 


are o-true for any interpretation of L in which o(=) is equality. 
(b) All the formulas in (a) are deducible from the set 


Ax LU {x = a|ris a variable} 
U{x=y= (P(a,x) > P(a,y))|Pis an atomic formula}. 


The formulas in this list, except for Ax L, are called the axioms of equality. 


(c) Let & be any interpretation of L in a set M for which the axioms of equality 
are true. Then o(=) is an equivalence relation in M that is compatible 
with the interpretations of all the relations and operations of L in M. If ¢' 
denotes the obvious interpretation of L in the quotient set M' = M/d(=), 
then o'(=) is equality, and TyL = Ty L. 


PROOF (SKETCH) 


(a) The ¢truth is easily established. We illustrate this by showing that 
the last formula is ¢-true. Suppose it were false at a point € € M. Then 
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ja = t\() = 1,|P\(€) = 1 and |P(a,t)|(€) = 0. The first assertion means 
that x§ = t§. But then |P|(€) = |P(z,t)|(€) by Proposition 2.10, contradicting 
the second and third assertions. 

(b) Deduction of t = t : = ax (axiom of equality); Va(~ = x) (Gen); 
Va(a = «) > t = ¢t (logical axiom of specialization); t = t (MP). 
Deduction of t; = tg } te = fy: 


(1) « (=x => y=2) (axiom of equality with = for P); 

(2) Q=> (P= (Q=> R)) = (P= R)), where P is & = y,Q is « = 2, Ris 
y = x (tautology); 

(3) « = x (axiom of equality); 

(4) (P= (Q => R)) => (P = R) (MP is applied to (2) and (3)); 


(5) «= y= y= (MP applied to (1) and (4)). 


We then twice apply Gen, the axiom of specialization, and MP, in order to 
deduce the formula t; = tg = ta = t, from (5); we replace t; by tz and te by ty 
to deduce tg = ty > t, = to; we use Lemma 4.4 to deduce the conjunction 
of these two formulas; and, finally, the tautology (t1 = tg > te = t1) A (t2 = 
ty t, = te) (ty = te tg = ti), together with MP, gives the required 
formula. 

The deduction of the third and fourth formulas in (a) will be left to the 
reader. The existence of a deduction of the fourth formula can be proved by 
induction on the number of connectives and quantifiers in P. P is represented 
in the form 7=Q, Q1*Q2, Vx Q, or dx Q; we assume that the formula with Q, Q1, 
and Qz in place of P has already been deduced, and we complete the deduction 
for P (see Mendelson, Chapter 2, Proposition 2.25). 

(c) If the axioms of equality are ¢-true, then so are the formulas in (a), since 
they are deducible. The first three formulas in (a), applied to three different 
variables x,y, and z, then show that the relation ¢(=) on M is reflexive, sym- 
metric, and transitive. In fact, let X,Y, and Z be any three elements of M, let 
€ € M be a point such that 2§ = X,y§ = Y; and z§ = Z and let ~ be the 
relation ¢(=) on M. The ¢-truth of the formulas in (a) means that 


XwX; XwVYSeYrX; XnV; and ¥~ZoXnZ. 


By definition, to say that ~ is compatible with the ¢-interpretation of all 
relations and operations on M means the following. Let p be a relation, and 
let ¢(p) C M’ be its interpretation. If (X1,...,X;) € ¢(p) and X/ ~ Xi, 
then (X1,...,X,,...,X,) € (p). Now let f be an operation, and let ¢(f) : 
M" => M be its interpretation. If d(f)(X1,...,X,) = Y and X/ ~ Xj, then 
O(f)(X1,...,X},..., Xr) = VX Y. 

We verify this compatibility by using the ¢-truth of the last formula in 
4.6(a) at a suitable point € € M. Here we take the formulas p(a1,...,2,) and 
f(vi1,...,%) = y, respectively, for P; we take the variable zi for t and the 
variable x; for x; and we set as =X ais =Xi,andy=¥., 

It follows from the compatibility that we can construct an interpretation ¢’ 
of LZ in M’ = M/ ~ such that ¢'(p) = $(p) mod ~, d'(f) = o(f) mod ~, and 
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¢' (=) is equality. The last formula in 4.6(a) will then imply that all the ¢-true 
formulas remain ¢’-true, and conversely. 

From now on, when we speak of the special axioms for any language in L, 
having the symbol =, we shall without explicit mention always include among 
them the axioms of equality for =. Models in which = is interpreted as equality 
are called normal models. 


Special axioms of arithmetic 
4.7. Proposition. The following formulas are true in the standard interpreta- 
tion of LyAr, and are called the special axioms of L,Ar: 


(a) The axioms of equality. 
(b) The axioms of addition: 


t+O0=2, wety=yta(ety)t+z=e+(y+2); 


z=yt+2>0=y. 
(c) The axioms of multiplication: 
2-0 =); el=2 L\y=y-d; (a-y)-z=a-(y-2). 
(d) The distributive axiom: 
ve (ytz)=uv-ytan-e. 
(e) The axioms of induction: 
P(0) AVa(P(2) > P(a + 1)) > Vax P(a), 


where P is any formula in LyAr having one free variable. 


The proof is trivial and will be left to the reader. We note only that the 
“proof” that the induction axioms are true itself uses induction. 


Remarks 


(a) In (b), (c), and (d) above, we have written the usual axioms for a 
commutative (semi) ring in order to shorten the formal deductions; any in- 
formal computation that uses only these axioms can easily be transformed into 
a formal deduction of the result of the computation in L;Ar. In Chapter 3 
of Mendelson’s textbook, he gives an apparently weaker set of axioms, and 
then shows how to deduce our formulas from them. This takes up 5-6 pages of 
text, and is basically a tribute to a historical tradition going back to Peano. 

(b) The induction axioms are a countable set of formulas in L)Ar; it is 
customary to say that 4.7(e) is an axiom schema. The corresponding fact in 
intuitive mathematics is stated as follows; “For any property P of nonnegative 
integers, if 0 has the property P, and, whenever x has the property P, x + 1 
also has the property P, then all nonnegative integers have the property P.” 
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Here “property of nonnegative integers” means the same as “any subset of the 
nonnegative integers.” 

However, in the means of expression of LAr there is no way to say “any 
subset.” Neither is there any way to say “all properties”; we can only list 
one by one the properties that are definable by formulas in the language. We 
recall that there are only countably many such properties, while the intuitive 
interpretation refers to a continuum of properties. Thus, the formal axiom of 
induction is weaker than the informal one, and is also weaker than the version 
of this axiom that is obtained by embedding L;Ar in L,Set. 


Special axioms of Zermelo—Fraenkel set theory 
(see the description of V in the appendix to Chapter II) 


4.8. Proposition. The following formulas are true in the standard interpreta- 
tion of L1Set in the von Neumann universe V: 


(a) Axiom of the empty set: Vx (x € @). 

) Axiom of extensionality: |\Vz(zEerezey)eSur=y. 

) Axiom of pairing: VuvwirV2(z2€aSz=uVz=vw). 

) Axiom of the union: Va dyVu(az(u € zAz€ x) Suey). 

(e) Axiom of the power set: Va tyVz(z Ca @a ey), where z C x is abbrevi- 
ated notation for the formula Vu(u € z => ue€ a). 

(f) Axiom of regularity: V x(7a = @ => Ay(y € cAyNu = B)), whereyNx =f 
is abbreviated notation for =Az(z EE yAz€ x2). 


PROOF AND EXPLANATIONS. This is not a complete list of the axioms of 
Zermelo—Fraenkel; the axiom of infinity, axiom of replacement, and also the 
axiom of choice, which are more subtle, will be discussed in the next subsection. 


(a) The truth of these formulas must, of course, be proved by computing 
the function | | using the rules in 2.4 and 2.5. We do this, for example, for the 
axiom of extensionality. Let € be any point in the interpretation class, and let 
X=25,Y =y%. We must show that 


We(zenszey)|(€) = |z =yl(€), 
ie., that 


(IZEX||ZEY|+0—|ZEx)I—|ZE¥))) = |X = Y|, 


min 
ZeEV 
where we have written |Z € X| instead of |z € a|(€’) with 26 = Z,26 = X, 
and so on. But the left-hand side equals 1 if and only if for every Z € V either 
both Z € X and Z € Y, or else both Z ¢ Y and Z ZY, that is, if and only if 
X=Y. 
More generally, if we replace V by any subclass M C V and restrict the standard 


interpretation of L;Set to M, then the same reasoning shows that The axiom of 
extensionality is true in M if and only if for any elements X,Y € M we have 


X=YeXnM=YnM, 
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i.e., uf and only if every element of M is uniquely determined by its elements 
which lie in M. This result will be used later. 

The analogous computations for all the other axioms will be given sys- 
tematically in a much more difficult context in Chapter III. Hence, at this 
point we shall only explain how to translate them into argot, as in Chapter I, 
and why they are fulfilled in V. 

(b) The axiom of the empty set does not need special comment. We only 
remark that if we interpret L,Set in a subclass MW C V, then the constant @ 
may be interpreted as any element X € M with the property that XN M = @, 
and this axiom will still hold. 

(c) The axiom of pairing is true, because if U, W € Va, then {U,W} € P(Va), 
so that all pairs lie in V. 

(d) The axiom of the union is true, because if X € V, then the set 
Y = UzexZ also lies in V. In fact, if X € Va41 = P(Va), then the elements 
of X are subsets of V,, and their union therefore lies in Vy41. 

(e) The axiom of the power set is true, because if X € V, then P(X) € V. 
In fact, if X € Va, then X C Va, and hence P(X) C P(Va) = Va4i, so that 
P(X) € Vasa. 

(f) The axiom of regularity is true, because any nonempty set X € V has 
an empty intersection with at least one of its elements; in this form the axiom 
is proved in the appendix to this chapter. 

4.9. The axioms of L;Set in Section 4.8 have one property in common: their 
simplest model in the standard interpretation is precisely the union V,, = 
UP@oVn of the first wo levels of the von Neumann universe. In other words, this 
is the set of hereditarily finite sets X € V, i.e., those such that if X, € Xn_1 € 
--- € Xo = X then all the X; are finite. 

Vio is the reliable, familiar world of combinatorics and number theory. 
Additional principles are needed to force us out of this world. There are two 
such principles: the axiom of infinity and the axiom schema of replacement. 


(a) Axiom of infinity: 


da(@ eaxAVy(y ex => {y} €2)). 


Here {y} € « is abbreviated notation for dz(z = {y,y} A z € x), where the 
meaning of z = {y,y} was explained in 3.7 of Chapter I. This axiom re- 
quires that we add to V,,, some set containing the elements 2, {@}, {{@}},... 
(a countable sequence). Then, in order to preserve the intuitive version of 
the axiom of the power set, we must add P(X), P?(X),..., thereby hopelessly 
leaving the realm of finite sets, countable sets, continua, and so on. 

It is a striking fact that none of this is necessary in the formal, as opposed 
to intuitive, version of set theory, where we can always limit ourselves to 
hereditarily countable submodels of V. This important fact will be discussed in 
detail in §7. 

(b) Axiom schema of replacement. We introduce the following convenient 
abbreviated notation (in any language of £; having the notion of equality): 
Aly P(y) means dy P(y) A VaVy(P(x) A Ply) > « = y). Thus, this formula is 
read; “There exists a unique object y with the property P,” where we assume 
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that = is interpreted as equality. When other variables besides y occur freely in 
P, the formula 3!yP(y) is true precisely when P determines y as an “implicit 
function” of the other variables. 

We can now write the replacement axioms. In the formula P below we list 
all the variables that occur freely in P: 


Ver: VenVu(Va(a € u => Aly P(a@,y, 21,---,2n)) 
=> dw Vy(y € w & Ar(a €ud P(a,y, 21,---;2n))))- 


The hypothesis says that “P gives y as a function of x € u (for given values 
of the parameters z1,...,2n)”; the conclusion says that “the image of the set u 
under this function is some set w.” 

From the standpoint of the formal theory it is worthwhile to note that from 
this axiom and the axioms of equality are deducible the so-called separation 
axioms, namely 


Ver: WenVadyVu(ueySuearn Plu, 21,..-.,2n))- 


This says that if we take the class of sets having a property P and intersect it 
with a set x, we obtain a set. 

The replacement axioms should be looked at very carefully. They go beyond 
the usual, “intuitively obvious” working tools of the topologist and analyst. The 
axioms assert that, for example, it is impossible to “stretch” an ordinal a too 
far by means of a function f; for any f we choose, there is always an ordinal 3 
such that all the values f(7),y < a, lie in Vg. In other words, the universe V is 
incomparably more infinite than any of its levels Vy. 

Even if we adopt this axiom, questions remain that are very similar in style, 
that are beyond the reach of our intuition, and that are not solvable using 
this and the other axioms. For example, do there exist so-called inaccessible 
cardinals y? One of the properties of an inaccessible cardinal y is the following: 
if f is a function from V, to V, (with a < y), then the set of values of f is an 
element of V,. In particular, there is an “upper bound” beyond which ordinals 
not exceeding y cannot be “stretched.” Do such infinities exist or not? 

After thinking about this and related problems, many specialists on the 
foundations of mathematics have come to the conclusion that such languages 
of set theory as LSet with a suitable axiom system are the only reality one 
should work with, and any attempt to make intrinsic sense out of the universe 
V or similar models is in principle doomed to failure. In particular, the set of 
formulas in L;Set that are true in the standard interpretation is not defined, 
and we can only talk about formulas that are deducible from the axioms. 

But we shall not entirely adopt this point of view for several reasons. The 
simplest reason is the feeling that a language without an interpretation not only 
loses its intrinsic justification, but also cannot be used for anything. We cannot 
even play the “formal game” well unless we master the intuitive concepts that 
give meaning to the symbols. A language (along with the external world) helps 
bring order and precision to these intuitive concepts, which, in turn, make us 
change the language or at least revise our earlier linguistic constructions. But 
we can never assume that we have achieved complete clarity. 


4 Deducibility 45 


We should understand the need for certain types of self-restraint. However, 
intellectual asceticism (like all other forms of asceticism) cannot be the lot of 
many. 


(a) Axiom of choice: 


Va(na = @ = dy(“yis a function with domain of definition 2” 
AVu(u€ ©Anu= 8 => Ju(w € ud “(u,w) € y”)))). 


That is, y chooses one element from each nonempty element u € «x. 

The belief that this axiom is true in V is at least as justified as the belief 
in the existence of V itself. Over the past fifty years it has become customary 
for every working mathematician to accept this axiom, and the heated con- 
troversies about it at the beginning of the century are now all but forgotten. 
The interested reader is referred to Chapter II of Foundations of Set Theory by 
Fraenkel and Bar-Hillel (North-Holland, Amsterdam, 1958). 


4.10. General properties of axioms. Despite the wide variety of concepts reflected 
in these axioms, each of our sets of axioms for languages in £; (tautologies; 
Ax L; special axioms of L; Ar and LSet) have the following informal syntactic 
characteristics: 


(a) An algorithm can be given that tells whether any given expression is an 
axiom (compare the syntactic analysis in §1 and the verification of the 
tautologies in Section 3.4). 

(b) A finite number of rules can be given for generating the axioms. 


It is clear that a priori, property (b) is less restrictive than (a). In fact, an 
algorithm as in (a) can be transformed into a rule for generating the axioms: 
“Write out all possible expressions one by one in some order, and take those for 
which the algorithm gives a positive answer.” 

It is actually natural to suppose that property (a) should characterize 
axioms, and property (b) should characterize deducible formulas, no matter 
how we explicitly describe the axioms and the deducible formulas in a given 
language. In Part III we make these intuitive ideas into precise definitions and 
show that (b) is strictly weaker than (a). See also the discussion in Section 
11.6(c) of this chapter. 


Digression: Proof 


1. A proof becomes a proof only after the social act of “accepting it as a proof.” 
This is as true for mathematics as it is for physics, linguistics, or biology. The 
evolution of commonly accepted criteria for an argument’s being a proof is 
an almost untouched theme in the history of science. In any case, the ideal for 
what constitutes a mathematical demonstration of a “nonobvious truth” has re- 
mained unchanged since the time of Euclid: we must arrive at such a truth from 
“obvious” hypotheses, or assertions that have already been proved, by means 
of a series of explicitly described, “obviously valid” elementary deductions. 
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Thus, the method of deduction is a method of mathematics par 
excellence. (“Mathematical induction” clearly comes out of the same tradition. 
Peano’s induction principle allows us to write only the first step and the general 
step of a proof, and is thereby in some sense the first metamathematical princi- 
ple. This point is observed by the tradition of listing Peano’s axiom among the 
special axioms (see 4.7(e)), but one way or another, it is one of the archetypes 
of mathematical thought.) 

The longer the deductive argument, the more important it is for all its 
elementary components to be written in an explicit and normalized fashion. In 
the last analysis, the amount of initial data in formal mathematics is so small 
that failure to observe the rules of hygiene in long deductions would lead to 
the collapse of the system if we did not have external checks on the system. 
In induction, on the other hand, relatively short deductions are based on a 
vast amount of initial information. Darwin’s theory of evolution is explained to 
school children, but life is not long enough to judge how persuasive the proofs 
are. We see a similar situation in comparative linguistics when the features of 
the so-called protolanguages are reconstructed. In such uses of induction, the 
“rules of deduction” cannot be so very rigid, despite the critical viewpoint of 
the neo-grammarians. 


2. The above observations concerning the method of deduction are supported 
by the fact that the notion of a formal deduction in languages of £; is a close 
approximation to the concept of an ideal mathematical proof. It is therefore 
enlightening to examine the differences between deductions and the arguments 
we use in day-to-day practice. 


(a) Reliability of the principles. Not only the mathematics implicit in the special 
axioms of L;Set and L,Ar, but even the logic of the languages of £2; is not ac- 
cepted by everyone. In particular, Brouwer and others have called into question 
the law of the excluded middle. From their extremely critical perspective, our 
“proofs” are at best harmless deductions of nonsense out of falsehood. 


The mathematician cannot permit himself to be completely deaf to these 
criticisms. After thinking about them for a while, he should at least be willing 
to admit that proofs can have objectively different “degrees of proofness.” 


(b) Levels of “proofness.” Every proof that is written must be approved 
and accepted by other mathematicians, sometimes by several generations of 
mathematicians. In the meantime, both the result and the proof itself are 
liable to be refined and improved. Usually the proof is more or less an 
outline of a formal deduction in a suitable language. But, as mentioned 
before, an assertion P is sometimes established by proving that a proof of P 
exists. This hierarchy of proofs of the existence of proofs can, in principle, 
be continued indefinitely. We can take down the hierarchy using 
sophisticated logical and set-theoretic principles; however, not everyone might 
agree with these principles. Papers on constructive mathematics abound with 
assertions of the type, “there cannot not exist an algorithm that computes 2,” 
whereas a classical mathematician would simply say “a exists,” or even “x exists 
and is effectively computable.” 
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(c) Errors. The peculiarities of the human mind make it impossible in practice to 
verify formal deductions, even if we agree that in principle, such a verification 
is the ideal form for a proof. Two circumstances act together with perilous 
effect: formal deductions are much longer than texts in argot, and humans are 
much slower at reading and comprehending such formal arguments than texts 
in natural languages. 


A proof of a single theorem may take up five, fifteen, or even fifty pages. In 
the theory of finite groups, the proofs of the two Burnside conjectures occupy 
nearly five hundred pages apiece. Deligne has estimated that a complete proof 
of Ramanujan’s conjecture assuming only set theory and elementary analysis 
would take about two thousand pages. The length of the corresponding formal 
deductions staggers the imagination. 


Hence, the absence of errors in a mathematical paper (assuming that none 
are discovered), as in other natural sciences, is often established indirectly: how 
well the results correspond to what was generally expected, the use of similar 
arguments in other papers, examination of small sections of the proof “under 
the microscope,” even the reputation of the author—in short, its reproducibility 
in the broadest sense of the word. “Incomprehensible” proofs can play a very 
useful role, since they stimulate the search for more accessible arguments. 


The last two decades have seen the appearance of a very powerful method 
for performing long formal deductions, namely the use of computers. At first 
glance, it would seem that the status of formal deductions might greatly 
improve, so that the Leibnizian ideal of being able to verify truth mechani- 
cally would become attainable. But the state of affairs is actually much less 
trivial. 

We first give two authoritative opinions on this question by C. L. Siegel and 
H. P. F. Swinnerton-Dyer. Both opinions relate to the solution by computer of 
concrete number-theoretic problems. 


3. The present level of knowledge concerning Fermat’s last theorem is as 
follows. Let p be a prime. It is called regular if it does not divide the numerator 
of any of the Bernoulli numbers Bz = z, By= a ..., Bp_3. Fermat’s theorem 
was proved for regular prime exponents by Kummer. For irregular p there is a 
series of criteria for Fermat’s theorem to hold. These criteria reduce to checking 
that certain divisibility properties do not hold; if they hold, we must try cer- 
tain other divisibility properties, and so on. The verification for each p requires 
extensive computer computations. As of 1955, this was successfully done for all 
p < 4002 (J. L. Selfridge, C. A. Nicol, H. S. Vandiver, Proc. Nat. Acad. Sci. 
USA, 41, 970-973 (1955)). 


Let v(x) denote the ratio of the number of irregular primes < x to the 
number of regular primes < x. Kummer conjectured that v(a) > $ as x > 
oo. Siegel (Nachrichten Ak. Wiss. Gottingen, Math. Phys. Klasse, 1964, No. 
6, 51-57) suggests that \/e — 1 is a more likely value for the limit, supports 
this opinion with probabilistic arguments, compares with the data of Selfridge— 
Nicol-Vandiver, and concludes this discussion with the following unexpected 
sentence: “In addition, it must be taken into account that the above numerical 
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values for u(x) were obtained using computers, and therefore, strictly speaking, 
cannot be considered proved”! 


4. Siegel’s point of view can be explained as a natural reaction to informa- 
tion received at second hand. But the excerpts below are from an article by a 
professional mathematician and experienced computer programmer (Acta 
Arithmetica, XVIII, 1971, 371-385). The article is devoted to the following 
problem: 


Let Li, L2, L3 be three homogeneous linear forms in u,v, w with real coefficients and 
determinant A; and suppose that the lower bound of |£1L2L3| for integer values of 
u,v, w not all zero is 1. What can be said about the possible value for A? 

The corresponding problem for the product of two linear forms is much easier, and was 
essentially completely solved by Markov. There are countably many possible values of 
A less than 3, each of which has the form 


A = (9 —4n-?)1/? 


for some integer n; the first few values of n are 1, 2, 5, 13, 29, and there is an algorithm 
for constructing all the permissible values of n. 


For three forms Davenport (1943) proved that A = 7 or A = 9 or 
A > 9.1. In Swinnerton—Dyer’s paper, all values of A < 17 are computed under 
the assumption that there are only finitely many such values and he gives a 
list of them: the third value is 148, and the last (the eighteenth) is \/2597/9. 
Discussing this result, he makes a very interesting comment: 


When a theorem has been proved with the help of a computer, it is impossi- 
ble to give an exposition of the proof which meets the traditional test—that 
a sufficiently patient reader should be able to work through the proof and 
verify that it is correct. Even if one were to print all the programs and all 
the sets of data used (which in this case would occupy some forty very dull 
pages) there can be no assurance that a data tape has not been mispunched 
or misread. Moreover, every modern computer has obscure faults in its 
software and hardware—which so seldom cause errors that they go unde- 
tected for years—and every computer is liable to transient faults. Such errors 
are rare, but a few of them have probably occurred in the course of the cal- 
culations reported here. 


The arguments on the positive side are also very curious: 


However, the calculation consists in effect of looking for a rather small num- 
ber of needles in a six-dimensional haystack; almost all the calculation is 
concerned with parts of the haystack which in fact contain no needles, and an 
error in those parts of the calculation will have no effect on the final results. 
Despite the possibilities of error, I therefore think it almost certain that the 
list of permissible A < 17 is complete; and it is inconceivable that an infinity 
of permissible A < 17 have been overlooked. 
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His conclusion: 


Nevertheless, the only way to verify these results (if this were thought worth 
while) is for the problem to be attacked quite independently, by a differ- 
ent machine. This corresponds exactly to the situation in most experimental 
sciences. 


We note that it is becoming more and more apparent that the processing, 
and also the storage, of large quantities of information outside the human brain 
leads to social problems that go far beyond questions of the reliability of math- 
ematical deductions. 


5. In conclusion, we quote an impression concerning mechanical proofs, even 
ones done by hand, which is experienced by many. 

After stating a proposition to the effect that “the function Two is correctly 
defined,” a gifted and active young mathematician writes (Inventiones Math., 
vol. 3, f.3 (1967), 230): 


The proof of this Proposition is a ghastly but wholly straightforward set of 
computations. It took me several hours to do every bit and as I was no wiser 
at the end—except that I knew the definition was correct—I shall omit details 
here. 


The moral: a good proof is one that makes us wiser. 


5 Tautologies and Boolean Algebras 
5.1 Proposition. A finite list, or “basis,” of tautologies—logical polynomials in 
three variables P,Q, R—can be given with the following property. 

Let L be any language in £1, and let F be the set of all formulas in L that 
can be obtained from the basis tautologies by substituting all possible formulas 
in place of P,Q,R. Then any tautology in L is deducible from F using only the 
rule of deduction MP. 

The choice of the basis tautologies is by no means unique. Our list will 
consist of the tautologies AO, Al, A2, A3, Bl, B2 in Section 3.4 and the following 
tautologies: 


Cl A(P = 7Q) = (PAQ),(PAQ) = 7(P = 7Q). 
C2 (AP > Q) > (PV Q), (PV Q) => (=P => Q). 
C3 P= (-Q = ~(P = Q)). 


C4 (P= Q) = ((7P = Q) > Q). 

C5 (P= Q) = (-Q => +P). 

C6 (P= Q) => ((Q=> P) = (P+ Q)). 

C7 (PS Q)=> (P= Q),(PSQ)>(Q=> P). 


We are not trying to economize on the size of the basis, but rather on the length 
of the proof of Proposition 5.1; hence, AO—C7 is not the shortest possible list. 
This does not make any difference for studying the logic of £1; but the study 
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of modified logical systems, for example those of the intuitionist type, requires 
more careful analysis of this list. 


PROOF OF PROPOSITION 5.1. Let € be a finite set of formulas in L, and let 
P be a logical polynomial (with a fixed representation) over €. For any map 
v:€ — {0,1}, we extend v to P using the same rules that defined the truth 
function | | in Section 2.5. We set 


5.2. Fundamental Lemma. Let €” = {Q”|Q © E}. Then for any v we have 
FUE P” (using MP). 

This lemma expresses the following idea. It is natural to prove Proposition 
5.1 by induction on the length of the tautology. However, the component parts 
of a tautology themselves might not be tautologies. The operation of taking P 
to P” forces any formula to be “v-true” and makes it possible for us to use 
induction. 


5.3. PROOF OF 5.1 ASSUMING THE FUNDAMENTAL LEMMA. Let P be a tau- 
tology, so that P® = P for all v, Set € = {P,,...,P,}. By the fundamen- 
tal lemma, F U {P/’,...,P’} - P using MP for any v: We show that then 
FU{P?,...,P%,}F P using MP. Descending induction on r then gives the 
required assertion (the assumption that P is a logical polynomial in P;,..., P, 
is not used in the induction step). 

The Deduction Lemma 4.5 shows that FU{P/?,...,P’_,} (P? => P) using 
MP;; to see this we only need examine the proof and notice that the deduction 
used only MP and the tautologies in F, since the rule of deduction Gen was 
not needed. 

Since for any v there exists a uv’ that coincides with v on P,,...,P,—1 but 
takes a different value on P,, it follows that P. = P and AP, => P are 
deducible from F U {P/,...,P°_,} using MP. On the other hand, the tau- 
tology C4: (P,=> P) = ((-P, = P) => P) lies in F. Applying MP twice, we 
deduce P. 


5.4. PROOF OF THE FUNDAMENTAL LEMMA. We use induction on the number 
of connectives in the representation of P as a logical polynomial over €. If there 
are no connectives, that is, P € €, then the assertion is obvious. Otherwise, P 
has the form —=Q or Q; * Q2, where x is one of the binary connectives. 
(a) The case P = 7Q. If v(Q) = 0, then Q” = =Q = P= P”. That Q” = P” is 
deducible from F U €” is precisely the induction assumption. 

On the other hand, if v(Q) = 1, then Q” = Q,P” = —7Q. Here Q is 
deducible from F U €” by the induction assumption, and then the tautology 
Q = --Q in F along with MP gives a deduction of P”. 


(b) The case P = Q1 * Qo. For the different connectives and possible values of 
v(Q1) and v(Q2) we first tabulate the formulas for which deductions exist by 
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the induction assumption and the formulas for which we must find deductions. 
In the columns under A and V we give formulas from which (Qi A Q2)” and 
(Qi V Q2)”, respectively, are deducible using MP and the tautologies in F 
(tautologies Cl, C2, and C5). Hence it suffices to find deductions of each of 
formulas 1-16 from F and the pair of formulas in the appropriate row in the 
second column using MP. 


Deduction of formulas 1-16. 


Given: 
deductions of Must Find: Deduction of (Qi * Q2)” 
v(Q1)v(Qz2) Qi and Q> => A 
0 0 7Q1, aAQe2 ie Qi => Qo 5. 37(Q1 => 7Q2) 
0 1 7Q1, Qe 23 Qi > Qe 6. 37(Q1 => 7Q2) 
1 0 Qi, Qe 3. (Qi > Qz2) fe a-3(Q1 > AQ2) 
1 1 Qi, Qe 4.Q1 > Qe 8. 7(Qi > 3Q2) 
(Qi) v(Q@2) rand Qa 
0 0 3Q1, 7Qe 9. (AQ) > Qz2) 13. Qi = Qe 
0 1 7Q1, Q2 10. -Qi + Q2 14. =(Qi1 & Q2) 
1 0 Q1,7Q2 11. =Qi > Qe 15. -(Q1 © Q2) 
1 1 Qi, Q2 12. =Qi > Qe 16. Qi 6 Qe 


Note that if P is deducible then for any @ the formula Q => P is also 
deducible (tautology Al and MP) and if —P is deducible then for any Q the 
formula P = Q is deducible (tautology B2 and MP). This immediately yields 
deductions of 1, 2, 4, 10, and 12. If we remove the double negations in the A col- 
umn using tautology B1 and MP, we obtain deductions of 5, 6, and 7. And 11 is 
deducible since by B1 the second column yields a deduction of =7=Q,. In the 
first and last rows the deductions of 1 and 4 yield deductions of Q2 > Q, by 
symmetry; tautology C6 and MP twice give a deduction of 13 and 16 from 
Qi > Qe and Q2 > Qu. 


3 is deduced from C3: Q, (7AQ2 (Qi Q2)) and the second column 
using MP twice. 
8 is deduced from C3: Q1 > (A7Q2 (Q4 (2)) and the second column 
using MP, applying B1 to Qe, and again using MP. 
9 is deduced from C3: =Q; > (AQ2 (=Q1 = Q2)) using MP twice. 
15 is deduced from 3 by C7 and C5 and MP twice. 


Finally, the deduction of 3 from Q, and 7Q2 yields by symmetry a deduction 
of =(Q2 = Q1)from =Q2 and Q2. Hence on the second row the deduction of 14 
is analogous to that of 15. 


Proposition 5.1 is proved. 
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5.5. Tautologies and probability. Tautologies are statements that are true 
independently of the truth or falsity of their “component parts.” This 
assertion still holds even if the components of a tautology are assigned proba- 
bilistic truth values || P|| in the algebra of measurable sets in some probability 
space. 

An example: the tautology RV SV AR V ~S—“either it will rain, or it will 
snow, or it won’t rain, or it won’t snow” !—is a reliable weather forecast despite 
the great complexity of the meteorological probability space. 

For a precise result, it is convenient to use the terminology of Boolean 
algebras. 


5.6. Boolean algebras. A Boolean algebra B is a set with an operation of rank 
one, with two operations V and A of rank two, and with two distinguished 
elements 0 and 1, such that the following axioms hold: 


(a) (A’)’ =A for all Ac B; 


) 
) A and V are distributive with respect to one another; 
(d) (aVb) =a Ab,(aAb) =a vb; 

)aVa=ah\a=a; 
)lAa=a;0Va=a. 


EXAMPLES. 


(a) B is the set of all subsets of a set M, ‘ is complement, A is intersection, V 
is union, 0 is the empty subset, and 1 is all of M. 

(b) B is the set of open-and-closed subsets of a topological space M with the 
same operations. 

(c) B is the algebra of measurable subsets (modulo measure-zero subsets) of a 
probability space M with the same operations. 


In all of these cases B can be identified with the space of characteristic 
functions of the corresponding subsets of M (taking the value 1 on the subset 
and 0 on the complement). 


5.7. Boolean truth functions. Let B be a Boolean algebra, and let € be a set of 
formulas in a language L. Let || || : € > B be any map. We extend this map 
to the logical polynomials over € (more precisely, to their representations) by 
means of the recursive formulas 


IP Ql = (PIA IQI) Vv (IPI A Qi), 
IP > Ql = WPI vl, 
IP V Qi = (PIV all, 
IPA Qi = (PIA Il, 
||P ll = [PI 


* A Russian proverb (translator’s note). 
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In the case B = {0,1}, these formulas coincide with the definitions in 2.5. 
We note that V and A have different meanings in the left- and right-hand sides. 


5.8. Proposition. Let the logical polynomial P be a tautology over E. Then for 
any map || ||: € — B to any Boolean algebra B we have ||P|| = 1. 


Proor. An example of a natural map || __ || can be obtained as follows: if we are 
given an interpretation of L in a set M, then the truth functions |P|(€) can be 
considered as the characteristic functions of the definable subsets of the inter- 
pretation class M (compare §2). Hence, our usual truth functions are essentially 
Boolean-valued. They are embedded in the Boolean algebra of all subsets of M, 
which decomposes as a direct product of two-point Boolean algebras {0,1}. 
Hence the proposition follows trivially in this case. 

In the general case one could use Stone’s structure theorem for Boolean 
algebras. However, instead of this we shall indicate how to reduce the problem 
to some simple computations using Proposition 5.1. Because of Proposition 5.1, 
it suffices to verify that the basis tautologies are || ||-true and that || ||-truth 
is preserved when we use MP. For example, if || P|| = 1 and ||P > Q|| = 1, then 
\|P||' = 0 while ||P’ Vv ||Q|| = 1, so that ||Q|| = 1 by 5.6(f); this answers the 
question about MP. The truth values of the basis tautologies are computed in 
a similar manner using the axioms in 5.6. 


Boolean truth functions will be the basic tool in the presentation of Cohen 
forcing in Chapter III. 


Digression: Kennings 


1. The process in §5 generates all possible tautologies starting with a finite 
number of tautologies and using a finite number of rules. It has become very 
popular in modern linguistics to attempt to find a suitable description of natural 
languages by means of such generating rules (N. Chomsky and others; see, for 
example, the book Eléments de linguistique mathématique by A. V. Gladkii and 
I. A. Mel’éuk, Paris, Dunod, 1972). 

However, many psychologists consider that this conception has little to do 
with the actual process of speech. According to one such opinion, real speech 
has more in common with a game of chance, chasing a fugitive, or a river 
current near a jagged shoreline. The choice of the next word in a sentence is 
determined statistically both by a formulating principle (an idea, situation, or 
psychological state) and by the peculiarities of semantics, grammar, phonetics, 
and the associative cloud formed by the earlier words. 

There is reason to hope that formal grammars are more closely suited to 
describing special fragments of natural languages that are in some sense more 
rigidly defined, such as certain language fragments in poetry or law. In these 


' A metaphorical compound word or phase used specially in Old English and Old 
Norse poetry, e.g. ‘swan-road’ for ‘ocean’—Webstar’s New Collegiate Dictionary 
(translator’s note). 
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fragments an essential role is played by “prohibitions,” which weed out, say, all 
texts not having a certain rhythmic pattern. Even the most casual attempt at 
writing poetry reveals the psychological reality of prohibitions in versification. 
But it is much less obvious that there is a set of generating rules that also has 
a psychological reality. 


2. Yet there has been at least one poetic system in which generating rules 
occupied an important place. One of the basic elements of skaldic (ancient 
Icelandic) poetry consisted of special formulas called kennings. A kenning is an 
expression that can replace a single word. For example, 


“storm of spears” is a kenning for “battle” 


“tree of battle” 
“bush of the helmet” 
“thrower of swords” 
“giver of gold” 


are kennings for warrior or man 


“sea of the wagon” is a kenning for “earth” 
“fire of war” is a “kenning for “gold” 


74 ” 

sky of sand ‘ 
are kennings for “sea,” and so on. 

“field of seals” 8 ; 


A simple kenning is a kenning no part of which is a kenning. The examples 
above are all simple kennings. They play the role of axioms; obviously, only very 
great poets have the right to create new simple kennings. It falls to the lot of 
the lesser poets to create new kennings using the rules of deduction. The rule 
of deduction of a new kenning from earlier kennings is as follows: any word in 
a kenning may be replaced by a (not necessarily simple) kenning for that word. 
Here is a complicated example of a kenning together with its decomposition 
into simple kennings (an actual example): 


“thrower of the fire of the storm of the witch of the moon of the steed of the ship 
stables” 


X ~- ~s 
ship 
as cot 
shield 
YS 
spear 
battle 
xX ~s 


sword 


warrior or man 


The Soviet poet Leonid Martynov thought of kennings as metaphors (a 
fundamental error, although an understandable one—kennings and metaphors 
play completely different structural roles in different poetic systems), and he 
wrote a poem “Songs of the Skalds” which ends as follows: 
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... But perhaps the translators have gotten a bit carried away? 


No! 

In our times, too, 
might there not live 
some throwers 
of the fire 
of the storm 
of the witch 
of the moon 
of the steed 
of the ship stables, 
squanderers 
of the amber 
of the cold earth 
of the great boar? 


or 


Anything is possible !! 
And who can be so very sure 
That there are no longer songs 
which could be called 


Surf 
of yeast 
of the people 
of the bones 
of the fjord? 


Perhaps there really are such songs now, 
Who can tell?? 

After all this, the professional opinion of M. I. Steblin-Kamenskii, whose 
book Icelandic Culture (Leningrad, Nauka, 1967) provided us with the above 
examples, sounds a little anticlimactic: “As a rule, any kenning for a man or 
warrior was no richer in content than the pronoun ‘he.’” 


EXERCISES: 


(a) Find the simple kennings from which the last two kennings in Martynov’s poem 
are deduced. 

(b) Construct the kennings of maximum length that are deducible from all the 
simple kennings in the above text. Prove that it is impossible to deduce longer 
kennings. 


6 Godel’s Completeness Theorem 


6.1. Let L be a language in £;, let @ be an interpretation of L, and let Ty L be 
the set of ¢-true formulas. In §3 it was shown that the set TyL is Godelian: it is 
complete, does not contain a contradiction, is closed with respect to deduction, 
and contains all the logical axioms Ax L. We say that a set of formulas € 
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in L is consistent if the set of formulas deducible from € does not contain a 
contradiction, i.e., if there is no P such that € F P and € | =P; otherwise, 
we say that € is inconsistent. The basic purpose of this section is to prove the 
following converse of the result in §3: 


6.2. Theorem (Gédel) 


(a) Any Gédelian set T is the set of o-true formulas Ty,L for a suitable inter- 
pretation of L in some set M having cardinality < card (alphabet of L) + 
No. (Here and below we always mean the cardinality of the alphabet without 
the variables.) 

(b) Any set of formulas € which contains Ax L and is consistent can be imbed- 
ded in a Godelian set. 


The model M which is constructed in the proof consists of expressions in 
some extension of the alphabet of L, and thus has a somewhat artificial charac- 
ter. In the next section we show that, if we are given some natural interpretation 
(M, ¢) of L, then we can find a submodel having cardinality < card (alphabet 
of L) + No. 


6.3 Corollary. (Deducibility criterion). Let € D Ax L. 


(a) A formula P is deducible from E if and only if either E is inconsistent, or 
P is ¢-true for all models @ of the set E having cardinality < card (alphabet 
of L) + No. 

(b) A formula P is independent of E if and only if both EU{P} and EU{AP} 
are consistent; by Theorem 6.2, this is true if and only if E U{P} and 
EU{APhhave models. 


In what follows we shall often omit the verification that various formal 
deductions exist. If the reader wants to fill in such a verification, this can almost 
always be done more easily using deducibility criterion 6.3 than directly. 


PROOF OF THE COROLLARY 


(a) If € is inconsistent, then any formula can be deduced from € 
(Proposition 4.2). Suppose € is consistent and P is ¢-true for all models of 
€. Let P = Va,---VanP be the “closure” of P. To prove that € - P. we 
consider two cases. 

(a1) E U {AP} is inconsistent. Then € U {AP} + P, so that, by the 
Deduction lemma, € ' ~P > P. The tautology (;P + P) + P and MP 
give € + P, and then the axiom of specialization and MP give € + P. 

(a2) € U {=P} is consistent. Then, by Theorem 6.2, the set € U {=P} has a 


model. In this model € is true and P is false, so that this case is impossible. 


(b) Suppose that P is independent of €, i.e., neither P nor —P is deducible. 
Then, by part (a), there exists a model of € in which P is true and a model of 
E in which P is false. The converse is obvious. 

We now proceed to the proof of Gédel’s completeness theorem. 
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6.4. Definition. Let € be a set of formulas in a language L. The alphabet of 
L is said to be sufficient for € if, for each closed formula ~VxzP(x) in E there 
exists a constant cp (depending on P) such that the formula 


Rp: Vax P(x) > 7P(cp) 


belongs to €. 

The intuitive meaning of Rp is; “If not all x have the property P, then some 
concrete object cp can be found that does not have this property.” We say that 
the alphabet (rather than €) is “sufficient” or “insufficient” because if E€ does 
not contain enough formulas of the type Rp, we can simply add all the R, to 
€, while if there are not enough constants cp, we then have to add them to the 
alphabet of the language. 

The plan for proving Theorem 6.2 is as follows. We first prove the funda- 
mental lemma: 


6.5. Fundamental Lemma. If a set of formulas E in a language L is consistent 
and complete and contains AX L, and if the alphabet of L is sufficient for E, 
then E has a model with cardinality < card(alphabet of L) + Xo. 


The next two lemmas allow us to embed any consistent € in a complete set, 
or in one for which the alphabet is sufficient. 


6.6. Lemma. If € is consistent and contains Ax L, then there exists a consis- 
tent and complete set of formulas E Dd €. 


6.7. Lemma. If € is consistent and contains AX L, then there exist: 


(a) a language L’ whose alphabet is obtained from the alphabet of L by adding 
a set of new constants having cardinality < card(alphabet of L) + No. 

(b) a set of formulas E in L’ that is consistent, contains € and Ax L’, and 
has the property that the alphabet of L’ is sufficient for E. 


However, these constructions get in each other’s way. If we complete a set € 
for which the alphabet is sufficient, we might obtain a set with an insufficient 
alphabet; if we add new constants, we increase the overall supply of formulas 
in the language, and thereby lose the completeness of €. Hence, we have to 
alternate the constructions in 6.6 and 6.7 a countable number of times in order 
to prove our last lemma: 


6.8. Lemma. If € > Ax L is consistent, then there exist: 


(a) a language L‘) whose alphabet is obtained from the alphabet of L by adding 
a set of new constants having cardinality < card(alphabet of L) + No. 

(b) a set of formulas E€©°) in L‘°) that is complete and consistent, contains E 
and Ax L‘©), and has the property that the alphabet of L‘°°) is sufficient 
for E©), 


After Lemma 6.8 is proved, Theorem 6.2 is obtained from the fundamental 
lemma applied to €(©) if we restrict the resulting model to L and €. 
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We now prove the lemmas. The fundamental lemma is proved in 6.9, and 
Lemmas 6.5, 6.6, and 6.7 are proved in Sections 6.10, 6.11, and 6.12, respectively. 


6.9. PROOF OF THE FUNDAMENTAL LEMMA. We begin by explicitly construct- 
ing the interpretation ¢ of L that will be our model for €. 

(a) By a constant term we mean a term in L that does not contain any symbols 
for variables. We let M = {t | t is a constant term} be a “second copy” of the 
set of constant terms, and we define the primary mappings of the interpretation 
@ of L in M as follows: 


o(c) =E (for any constant c); 
o(f)(t,-.-,t) = f(ti,---,tr) (for each operation symbol f of 

degree r and all constant terms ¢1,...,¢,); 
(t1,...tr) € O(p) if and only if p(ti,...,tr) EE 


(for each relation p of degree r 


and all constant terms t),...,t,). 


We now prove the following claim: 


(b) Claim. Let P be a closed formula. Then |P|y = 1 if and only if P € €. 
(This claim implies that ¢ is a model for €. In fact, if P € € is not closed, then 
its closure Vx ,---Va,P is deducible from € using Gen, and hence, since € is 
complete and consistent, Vr, ---Vr,P € €. By the claim, |Vz1---VanPl¢ = 1, 
so that |P|g = 1.) 


PROOF OF THE CLAIM. We use induction on the total number of quantifiers 
and connectives in P. We shall write |P| instead of |P|g. 

(b,) P is an atomic formula p(t1,...,t,). The claim follows from the defi- 
nition of |P| and the list of primary mappings, since the ¢; are constant terms 
(or else P would not be closed). 

(bo) P = -=Q. If |P| = 1, then |Q| = 0 and Q ¢ E by the induction 
assumption applied to Q; since E€ is complete, we have =Q € €, i.e., PEE. On 
the other hand, if |P| = 0, then |Q| = 1 and Q € €, so that 3Q ¢ E since € is 
consistent. 

(b3) P = (Qi > Q2). We first show that if |P| = 0 then P ¢ €. In fact, in 
this case |Qi| = 1 and |Q2| = 0; by the induction assumption, Qi € €, Qa ¢ E; 
since € is complete, =Q2 € €; using the tautology Qi; => (-=Q2 (Qi => Q2)) 
and using MP twice yields € (Qi = Q2). Since E is complete and consistent, 
all closed formulas that are deducible from € belong to €; hence, =(Q; => 
Q2)=-P € €, so that PZ E. 

We now show that if P ¢ €, then |P| = 0. In fact, since E€ is complete, 
we then have =P = 7=(Q, > Qe) € E. The tautologies =(Q; > Q2) > Qi 
and 7=(Q, Q2) (Q2 and MP give EF Q, and E7Qg, so that since E€ is 
complete and consistent, Qi € € and =Q2 € €. By the induction assumption, 
\Qi| =1 and |Q2| = 0, so that |P| = |Qi > Q2| =0. 

(ba) P = Q1 V Q2 or Qi A Qa. Using the tautologies that express \ and V 
in terms of => and -, we can reduce to the previous cases; we omit the details. 
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(bs) P = VaQ. If x does not occur freely in Q, then |P| = 1 is equivalent to 
|\Q| = 1, ie., by the induction assumption, to Q € €. But Q € E is equivalent 
to Va Q € €, in one direction using Gen and in the other direction using the 
axiom of specialization with t = x and then MP. 

We now assume that x occurs freely in Q. We first suppose that |P| = 1 
but P ¢ €, and obtain a contradiction. If P ¢ €, then =P € €, i.e., ~Vx Q(x) € 
€. Since the alphabet of L is sufficient for €, it follows that € contains the 
formula =Vz Q(x) = 7Q(cg). Applying MP, we obtain € + —=Q(cg); since € 
is consistent, we have Q(cg) ¢ €. By the induction assumption, |Q(cg)| = 
0 (Q(cg) is closed!). This means that |Q(zx)|(€) = 0 for € € M if x$ = ca, 
contradicting the assumption that |P| = 1. 

We now suppose that |P| = 0 but P € €, and obtain a contradiction. Since 
|P| = 0, for some € € M we have |Q(zx)|(€) = 0. Let t be the constant term for 
which «§ = t. Clearly t is free for x in Q, so that 0 = |Q(x)|(€) = |Q(t)|. Hence 
Q(t) ¢ E by the induction assumption, and ~=Q(t) € € since E is complete. 
On the other hand, if P € €, ie., VaQ, € E, then the axiom of specialization 
Va Q(x) > Q(t) gives us E | Q(t). But since aQ(t) € E, this contradicts the 
consistency of €. 

(bg) P = ArQ. This reduces to the previous case using the axiom that 
expresses J in terms of V and negation; we omit the details. 


6.10. PROOF OF LEMMA 6.6. In order to embed € in a complete and consistent 
set €', we shall have to use Zorn’s lemma and the deduction lemma for L (see 
Section 4.5 of Chapter II). Zorn’s lemma will be applied to the set CE = the 
set of sets of formulas €' in L that contain € and are consistent. The set CE is 
ordered by inclusion. 


VERIFICATION OF THE HYPOTHESIS OF ZORN’S LEMMA. Letlé, te 1 be a lin- 
early ordered subset of C€, i.e., for any a and 6 we have either €, < &, or 


En < €). Then the union U€,, a belongs to CE. In fact, otherwise UE, would be 
inconsistent, and there would exist a deduction of a contradiction from a finite 
number of formulas. Suppose these formulas are contained in €,,,...,€,,- But 


one of these sets contains the remaining n — 1; this set would be inconsistent, 
contrary to the definition of CE. 


PROOF OF LEMMA 6.6 FROM ZORN’S LEMMA. The set C€ has a maximal 
element, i.e., a consistent set €° D E such that if Q ¢ E€ then E U {Q} is 
inconsistent. We claim that €' is complete. In fact, suppose that there were 
a closed formula P such that P ¢ € and =P ¢ €. Since €' is maximal, 
it follows that € U{P} + R and € U {=P} + R for any formula R. By 
the deduction lemma, € | P > Rand € + AP = R. Using the tautology 
(P => R) ((-P > R) => R)) and MP, we have € F R, contradicting the 
consistency of € . 


6.11. PROOF OF LEMMA 6.7. In constructing a language with a sufficient alpha- 
bet for a consistent set of formulas €’ that contains € and Ax L’, we proceed 
in the most natural way. 

(a) We add to the alphabet of L a set of new constants whose cardinality is 
that of the alphabet of L + Xo. We obtain a language L’. 
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(b) We consider the set of formulas € U Ax L’ in the language L’, where 
Ax L’ consists of all the logical axioms of L’. We claim that this set of formulas 
is consistent. In fact, if there were a deduction of a contradiction from €U Ax i 
in L’, then the following procedure would transform it into a deduction of a 
contradiction from € in L: take the finite set consisting of all the new constants 
that occur in the formulas in the deduction and replace these constants by 
old variables (in L) that do not occur in the formulas in the deduction. It is 
easily verified that the deduction of a contradiction remains a deduction of a 
contradiction, and now lies entirely in L. 

(c) We consider the set S of formulas P(a) containing one free variable x 
and such that -V, P(x) € € U Ax L’. For each P(x) in S we choose a new 
constant cp subject to the following restriction: each cp can be assigned a 
natural number, its rank, in such a way that if a constant of rank n occurs in 
P(x) then cp has rank > n. This can be done since card('})< card(alphabet of 
L') = card(alphabet of L) + No. For each P(a) in S' define the formula 


R,:-Vx P(x) => 7P(cp) 


and finally let 
E =EVUAXL U{Rp|P(a) € SI. 


Call any Rp an R-formula. Note that no R-formula has the form =Va P(a), so 
that L’ is sufficient for €’. It remains only to verify that €’ is consistent. If a 
contradiction were deducible from €' then it would be deducible using finitely 
many R-formulas. At least one Rp among these must be such that cp does not 
occur in any of the others: namely, pick cp of maximal rank. Hence it suffices 
to verify that if € UAx L’ UR is consistent, where R is a set of formulas not 
containing cp, then the addition of Rp does not lead to a contradiction. 

Suppose € U Ax L’ URU {R,} were inconsistent. Then, in particular, we 
would have a deduction of ~Rp and, by the deduction lemma, € U Ax LURE 
Rp => Rp. The tautology (Rp => 7~Rp) = —=Rp and MP would yield a 
deduction of —Rp; that is, 


EU Ar L' URE (-V2 P(x) > =P(cp)). 


Then the tautology =(P = 7=Q) => Q and MP would yield a deduction of 
P(c,p). Transform this deduction by replacing the constant cp with a variable 
y that does not occur in the formulas in the deduction. Since cp does not 
occur in FR it is easily verified that the transformation yields a deduction of 
P(y) from € U AxL’ UR. Using Gen, € U AxL’ UR + Vy P(y). But since 
Wa P(x) € € UAxL’, we have € U Ax L’ + AVy P(y). Hence € U Ax L’ UR is 
inconsistent, contrary to hypothesis. 
6.12. PROOF OF LEMMA 6.8. Let DL be a language in the class £1, and let & 
be a set of formulas in L. We embed € in a complete and consistent set E, 
and then apply Lemma 6.7 to (L,€'). We let L* and €* denote the resulting 
language and set of formulas. We further define inductively 


(6) = (2,6), (EA, 69) = (10° £0"), 


7 Countable Models and Skolem’s Paradox 61 


and finally 


Lo) = U LO, Ea U ON 
i=0 i=0 


The set €(©) is consistent, since any deduction of a contradiction would be 
obtained “at some finite level,” and all the €™ are consistent. It is complete, 
since every closed formula in L‘©) is written in the alphabet of L® for some i, 
and €+)) contains the completion of EM in L™. Finally, the alphabet of L‘°°) 
is sufficient for €(©°) by the same argument. 


This completes the proof of the lemmas. 


6.13. DEDUCTION OF THEOREM 6.2 FROM THE LEMMAS. Let T be a Gédelian 
set of formulas in L. Applying Lemma 6.8 to T, we embed (L, T) in (L‘°), T°), 
where the pair (L*°),T‘%?) satisfies Lemma 6.5. Let $‘°) be an interpretation 
of L‘°) such as must exist by Lemma 6.5. The cardinality of MM‘) does not 
exceed card(alphabet of L)+No. The restriction ¢ of ¢‘©) to L satisfies the con- 
dition T Cc TyL. We prove that T’ = TyL. In fact, let P € TyL. If P is closed, 
then P € T, since either P or =P lies in T’ by completeness, and =P ¢ T 
because P is ¢-true. If P is not closed, and 21,...,2, are the variables that 
occur freely in P, then Vax, P is closed and belongs to T. By the axiom of spe- 
cialization, P is deducible from TU {Va1---Va,P}, so that P € T, since T is 
closed under deduction. This proves the first assertion of the theorem. 

The second assertion follows from the analogous argument applied to 
€ instead of T. We find a model @ for €; then E€ C TL and TyL is 
Godelian. 


6.14. In conclusion, we note that if the alphabet of D contains a symbol = 
for which the axioms of equality are included in € (or T), then there exists 
a normal interpretation that satisfies Theorem 6.2 and takes = into equality. 
To prove this, we take the above model M and divide out by the equivalence 
relation ¢(=), as in Section 4.6. 
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“T know what you’re thinking about,” said 
Tweedledum: “but it isn’t so, nohow.” 
“Contrariwise,” continued Tweedledee, “if it 
was so, it might be; and if it were so, it would 
be: but as it isn’t, it ain’t. That’s logic.” 
Lewis Carroll, Through the Looking Glass 


7.1. In this section we discuss the technique of “cutting down” models, in par- 
ticular, models for L,Set. Let L be a language in £1, let M C N be two sets (or 
classes in V), and let ¢ and ~ be interpretations of L in M and N, respectively, 
that are compatible in the obvious sense, so that ~ is an extension of ¢. We 
have a natural embedding of interpretation classes M Cc N. 
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7.2. Definition. A formula P in L is called (M, N)-absolute if for all € € M 
we have 
|Plar(£) = |Plv(é). 


(We write | |az instead of | |g, and so on.) 


The property of being absolute is usually used as follows: if P is absolute, 
and is also N-true, then it is automatically M-true. A formula P often fails to 
be absolute for the following reason: a formula P = dx Q(x) can be N-true, so 
that N has an object with the property Q, but not M-true, because no such 
object lies in MM. The proof of the following assertion shows how to handle this 
situation. 


7.3. Proposition. Let € be a set of formulas in L, let Ww be an interpretation of 
Lin N, and let Mp C N be a subset. Then there exists a set M, Mp CM CN, 
having cardinality < card Mo + card E+ No, such that all the formulas in € 
are (M, N)-absolute. 


7.4. Corollary (L6wenheim-Skolem). If the alphabet of L is countable and N 
is a model for E€, then N has a countable submodel for E. 


The corollary follows from Proposition 7.3 if we construct a countable 
submodel with respect to which all the formulas of L are absolute, and in 
particular, in which all formulas that were true before remain true. 


PROOF OF 7.3. Suppose the set M; C N,i > 0, has already been defined. Set 
Mis = MiU {a8 |€ =€(2,P,)}, 


where « runs through the variables in L, P runs through the subformulas of the 
formulas in €, and € runs through the points of M;, and where for each fixed 
triple (x, P, &), € (a, P,&) is any one variation of € along x for which |P\w(é’) = 
if such a variation exists; otherwise, the triple does not make any contribution 
to My41. 

Further, set MM = U%,)M;. M clearly has the desired cardinality. We now 
show that all subformulas of the formulas in € are (/, N)-absolute. We use 
induction on the number of quantifiers and connectives in the formula. The 
result is obvious for atomic formulas; the inductive step when a new formula 
is constructed using a connective is also clear. The quantifier V reduces to J in 
the usual way. 

Thus, suppose P is absolute. We show that dz P is also absolute. It suffices 
to consider the case that x occurs freely in P. For € € M we have 


1, if there exists a variation € N of € along x 
Se Plv(é)= 4 — with |Plw(€) = 1, 
0, otherwise; 


1, if if there exists a variation €” € M of €along x 
ax Plu (é) = with |P|w(€”) = 1, 
0, otherwise. 
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But the conditions on the right are equivalent. In fact, there exists a variation 
7 of the point € along variables that do not occur freely in P, such that 7 € M; 
for some i. Then in the case [Ax P|y(€) = |S2 P|y(n) = 1 there isa & EN 
with |P|y(é’) = 1 => there is an nf € Mi41 with |P|n(n’) = 1, where 7/ 
is a variation of 7 along x, by the construction of M;1. This completes the 
proof. 


7.5. We now apply Corollary 7.4 to the standard interpretation of L;Set in 
the von Neumann universe V and the set € of Zermelo—Fraenkel axioms. 
We obtain a countable model N for this axiom system, but this model has 
one defect: if X € N, some elements in X might not themselves belong to N, 
i.e., € is not necessarily transitive. The following result of Mostowski shows how 
to replace N by a transitive countable model. 

Let N C V bea subclass, and let ¢ C N x N bea binary relation. We shall 
write XeY instead of (X,Y) € «. For any X € N we set 


[X] ={Y|YeX}. 


Suppose that [X]eV for all X € N, i.e., each [X] is a set rather than a class. 
We consider the interpretation ¢@ of L,Set in the class N for which @(€) is € 
and ¢(=) is equality. 


7.6. Proposition (Mostowski). Suppose that the axiom of extensionality and 
the axiom of the empty set are ¢-true, and that N does not contain any infi- 
nite chain --- XneXn_1é--:EX 1EXo. Then there exist a unique transitive class 
M CV anda unique isomorphism f : (N,e) + (M,€). 


If we apply this proposition to the countable model (N, €) for the Zermelo— 
Fraenkel axioms in Section 7.5, we obtain a transitive countable model 
(M,€), that is, a “small-universe.” (The condition that all e-chains are finite 
holds even in V, as well as in N;[X] is the subset XY MN CX, and hence is an 
element of V.) 


7.7. PROOF OF PROPOSITION 7.6. Using transfinite induction, for every 
ordinal @ we construct sets Na C N,M, C V and compatible isomorphisms 
fa: (Na; él.) > (Ma, € |), and we show that UNg = N. 


(a) Since the axiom of extensionality is ¢-true and ¢(=) is equality, we easily 
obtain X, Xo [X1] [Xo] for all X1,X2 € N. Let Sy € N be the 
interpretation of the constant @ of the language L,Set. Since the axiom of the 
empty set is @-true, we may conclude that @y is the unique element of N for 
which [@n] = @ € V. We set 


No={2n}, Mo={2},  fo(Onw) =. 


(b) Recursive construction. Let a be an ordinal. Suppose that Na, Ma, and fa 
have already been constructed. We set 


Nati = {X € N|[X] C Na AX ¢ Na} U Na; 
foti(X) = {falY)|Y € [X]}, for X € Na+i\Na} fa+i|N = fa; 
Ma+1 = image of fa41 = range of fa+1.- 
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If @ is a limiting ordinal, we set Ng = Us<gNa, Mg = Ua<gMa, and fg = 
Ue<pfa- Finally, we set M = UM, and f = Ufa, where the union is taken over 
all the ordinals. 


(c) Inductive proof. We verify that for each a, 


(c1) No is a set, i.e, Nae V. 

(co) M. is a transitive subset of V. 

(c3) fa ts an isomorphism of Na with M,, taking € to €. 
(ca) N= Ua Nas 


Assertions (c,)—(c3) are obvious for a = 0. If they hold for all a < 6 and if is 
a limiting ordinal, then they also hold for (@. It remains to check the step from 
atoa+l. 

(ci) | ] is obviously a function from Ng+1 \ Na to P(Na); since the axiom 
of extensionality is true, there exists an inverse function. Its image Na+1 \ Na 
is a set, since N, and therefore P(N), are sets by the induction assumption. 

(co) Any element in Ma4: \ Ma has the form {fa(Y)|Y € [X]}, where 
X € No41 \ No. But then [X] C Ng. Hence, an element f,(Y) of this element 
of Ma+1 \ Ma belongs to the image of fa, ie., to the set M, C Masi. This 
proves the transitivity of Mo41. 

(c3) We first verify that fa+1 is a bijection. The surjectivity is obvious; 
using the induction assumption, we see that it suffices to verify injectivity on 
Noa+1 \ Na. But if X1,X2€ No+1\Na and fati(X1) = fa+i(X2), then 


{fa(Y IY € [Xil} = {fo(Y)|¥ € [X2]}- 


Since fa is injective, we obtain [X 1] = [Xe], so that X; = Xo. 
We then obtain 


YeX SY €[X]S falY) © fatilX), 


so that for X € Na+i1\Na the relation YeX goes to fo4i(Y) © fapi(X). This 
is clearly sufficient to complete the induction. 

(c4) Finally, we verify that N =U N,. Let N = N\UN,q; we suppose that 
N’ is nonempty and show that this leads to a contradiction. If there existed an 
X € N’ such that [X]M N’ = 2, then we would have [X]M N Cc UN,; then 
[X] C No, for some ag, so that X € Noo+1, contradicting the assumption that 
X € N\UN4. On the other hand, if we had [Xo] ON’ # @ for all Xo € N’, 
then, successively choosing Xn+41 € [Xn]N.N’, we would obtain an infinite chain 
XnpieXneEXn_-1€'+:EXo, contradicting the hypothesis of the theorem. 


(d) Suppose we have two transitive subclasses M and M’, and an isomorphism 
g:(M,€) > (M ,é). We set Ma = Va M and M, = VM. An obvious 
induction on a then shows that g is the identity map. The proposition is 
proved. 


7.8. Skolem’s paradox. Let M be a transitive countable model for the Zermelo— 
Fraenkel axioms. Then the following formulas are M/-true: 


the axiom of infinity; 
the power set axiom; 
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Cantor’s theorem that there is no mapping of x onto P(a) for any set x (this 
theorem is deducible from the Zermelo—Fraenkel axioms). 

Since P(X) is uncountable when X is countably infinite, the content of the 
assertion that the power set axiom is true in the countable model M must be 
very different from the content of the assertion that this axiom is V-true. In fact, 
in LySet let “y = P(x)” be abbreviated notation for the formula Vz(“z C x” 
= z€ y). Let € € M,x§ = X € M, and y§ = Y € M. Then we easily 
see that 


“y= P(a)"|u(E)=1SY={Z|\ZCcXAZEM}, 


ie, P(X)u = P(X) M plays the role of P(X) in M. Here P(X) yy is at 
most countably infinite, since M is countable; so, from the usual point of 
view, there exists a mapping of a countably infinite set X onto P(X)j,. This 
does not contradict Cantor’s theorem, because the M-truth of Cantor’s theo- 
rem merely means that there are no (graphs of) such mappings in the model 
M. Such graphs may exist outside of /, but if we add such a graph to M 
(along with everything that must be added for the axioms to remain true), we 
thereby increase M, and at the same time P(X) jz, and the mapping stops being 
onto. 

All such ways in which statements of set theory change their meaning in 
countable models are customarily referred to as Skolem’s paradox. 

Cohen was the first who was able to use the properties of countable models 
to prove the nondeducibility of the continuum hypothesis. In his models sets 
of “M-intermediate” cardinality lie between wo and P(wo) a, although from an 
external point of view both wo and P(wo) a, along with all the other sets, are 
simply countable. Cohen introduced fundamentally new ideas of relativizing the 
very notion of truth, and it is only with the benefit of hindsight that we can so 
easily understand the situation in his models. For details, see Chapter III. 

Skolem himself, and other specialists on the foundations of mathematics, 
were willing to work with countably infinite sets, but not with larger infinities. 
They considered Skolem’s paradox to be a manifestation of the relative char- 
acter of set-theoretic concepts. In particular, they considered that there exist 
“different continua” P(wo) az, none of which coincides with the “real” P(wo). 

From the point of view of the topologist or analyst, for whom the contin- 
uum is a working reality, the existence of countable models means that for- 
mal language has limitations as a means of imitating intuitive reasoning. We 
encountered similar limitations when discussing the formal axioms of induction 
in §4. 

For the psychologist or philosopher, perhaps the most interesting aspect of 
the situation is that any mathematician can understand the viewpoint of ano- 
ther mathematician (without having to agree with it). This means that what 
mathematician A says, though demonstrably incapable of conveying unambigu- 
ous information about the continuum, nevertheless is capable of bringing the 
brain of mathematician B to the point where it forms an idea of the continuum 
that adequately represents the idea in A’s brain. Then B is still free to reject 
this idea. 
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“T know what you’re thinking about,” said Tweedledum: “but it isn’t so, 
nohow.” 


8 Language Extensions 


8.1. In this section we study the formal version of “introducing new notation.” 
Here we consider only names of new functions and constants that are “demon- 
strably definable” in the language. Adding such names to the alphabet short- 
ens formulas and formal deductions, but does not increase the set of deducible 
formulas—this will be the fundamental theorem of this section. 

Of course, in practice, abbreviated notation and well-chosen new names can 
immediately make accessible to our intuition entire areas of mathematical facts 
that were previously inaccessible. One of the best-known examples is the groups 
introduced by Galois to study equations. In 1924, commenting on the attempt 
to curb the inflation in Germany by introducing a new unit of currency, the 
Rentenmark, Hilbert remarked skeptically, “A problem cannot be solved by 
renaming the independent variable.” But as his biographer Constance Reid 
noted, Hilbert was wrong: the economic situation gradually stabilized. 

We start with the following data. 


8.2. Let L' bea language in £; with equality and with an infinite set of variables, 
and let P’(«) be a formula in L’ in which x occurs freely. We recall that the 
abbreviated notation J!a P’ (x) (read: “there exists a unique x with the property 
P’”) stands for the formula 


Ar P' (x) AVaVy(P (2) AP (y) +2 =y). 


Let €’ be a set of formulas in L’ that contains Ax L’, the axioms of equality, 
and perhaps some special axioms. Suppose that the formula 4!a P’(x,y1,.--, Yn) 
is deducible from €’, where P’ has no free variables other than 2, y1,-.--,Yn- 
Intuitively, this means that P’ defines x as an implicit function of y1,...,Yn; 
and in the informal text we can introduce a new notation for this notation for 
this function, say, « = f(y1,.--,Yn), and then always use that notation. Now 
we give the formal version of this procedure. 


8.3. Proposition. Under the conditions in 8.2, let L denote the language in £1 
whose alphabet is obtained from the alphabet of L’ by adding a new operation 
symbol f of degree n ifn > 1, or a constant f ifn =0. Let E be the smallest set 
of formulas in L containing Ax L, the axioms of equality, E’, and the formula 
P(f(yi,-- epVa)s Yilige + Yn): 

Then there exists an explicitly describable map from the set of formulas of 
the (richer) language L to the set of formulas of the (poorer) language L' that 
correlates with each Q a translation Q! and that has the following properties: 


(a) If f does not occur in Q, then the translation of Q coincides with Q. 
(b) If Q is deducible from E in L, then Q is deducible from E' in L . In par- 
ticular, the set of formulas in L’ that are deducible from €' in L' coincides 


8 Language Extensions 67 


with the set of formulas in L that do not contain fand are deducible from 
E in L. 


PROOF. 


Translation of formulas. Suppose n > 1. (The case n = 0 is analogous, and 
is simpler, so we shall omit it.) The first effect of adding f is to increase the 
set of terms: L includes terms of the form f(ti,...,tn), where f can occur in 
t1,...,tn, and so on. In order to decrease the number of references to f, we must 
say “f(ti,...,¢n)” in a roundabout way: “that x for which P(a,t),...,tn).” 
This is the basic idea behind the translation of formulas. We now give a precise 
inductive definition. 


(a) Aterm f(ti,...,t,) is called a simple f-term if f does not occur in t1,..., tn. 
(b) Let Q be an atomic formula in L. If f does not occur in Q, we let Q be its 
own translation. If f occurs in Q, then there exists a simple f-term f(t1,...,tn) 


that occurs in Q. We take the very first occurrence of a simple f-term in Q, 
then take a variable symbol x that does not occur in Q, substitute it in place 
of this occurrence, thereby obtaining a formula Q*, and finally construct the 
formula 


Qu) .20(P(e, th, oe itn) x Q*(x)). 


We apply this procedure to QM) to obtain Qo): and so on. After a finite number 


of steps we obtain a formula Qi) = Q’ in which f does not occur. This Q’ is 
the translation of Q. 


(c) If Q is not an atomic formula, it has the form 7=Q, or Qi * Qo (where * 
is a connective), or else Vy Qi or Jy Q,. In all cases Q is translated automati- 
cally using the translations of Q, Qj, Q2, ie., by “from Q produce Q’” to the 
component parts. 


Translation of deductions. The problem is the following: Let Q1,...,Qn = Q 
be a deduction of Q from €, and let Q’ be the translation of Q. We must con- 
struct a deduction of Q’ from €’. The most obvious idea is to write the sequence 
of translations Oy ie Shs (ie Why isn’t this a deduction of Q’ from € 7 since MP 
and Gen are translated in a trivial way, and tautologies are translated as tau- 
tologies? Because, for example, the logical axiom Vx R(a) => R(f) might appear 
in this sequence, and this formula stops being an axiom after it is translated, 
if f occurs in R. Hence, we must fill in the sequence Cs, a QQ, by adding 
deductions from €’ of certain of its terms. This is a rather cumbersome com- 
binatoric procedure, which one can read in §74 of Kleene’s book Introduction 
to Metamathematics (Van Nostrand, New York—Toronto, 1952). (The moral of 
the story is that new notation really does economize on time and space.) 


Instead of using this procedure, we shall give an ineffective proof that € oh Q' 
using the deducibility criterion in 6.3. We state this criterion once more: 


(a) If Q is true in any model of E', then €' + Q’. Since E€' contains the axioms 
of equality, we can slightly strengthen this as follows: 
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(b) If Q' is true in any normal model of E then Q is true in any model of E. 


Recall that = is interpreted as equality in a normal model. On the other hand, 
in §4 we showed that in any model = is interpreted as an equivalence relation 
that is compatible with the interpretation of all the constants, functions, and 
relations. Factoring out by this equivalence relation leads to a normal model, 
in which the truth values of all the formulas remain as before. 


(c) The normal models of E' (in the language L') coincide with the normal 
models of E (in the language L). 


More precisely, we can give the following natural one-to-one correspondence 
between them that preserves the truth function. We shall limit ourselves to the 
case n > 1. Let ¢ be a normal interpretation of L in M for which |Q |g = 1 for 


all Q’ € &’. In particular, since €’ + 3! 2 P’, we have 


dla P (x, y1,---,Yn)|6 = 1. 
Computing the truth value on the left at a point € € M and using the normal- 
ity of the model, we then find that to every n-tuple (yf, ...,y5) € M™ there 


corresponds a unique av € M such that |P’ (as ,y§,...,y§)|o = 1 (this is not 

the standard notation, but the meaning is clear). We now interpret the symbol 

f (which is the new symbol in the language L) as the function WM” — M that 

takes (yf, ...,y8) to z® . We obviously obtain a normal model for € in L. 
Conversely, any normal model for € can be restricted to L’ to obtain a 

normal model for € . 

(d) If Q is deducible from E in L, then Q’ is true in any normal model for E. 


PROOF. Q is true in any model ¢ for €. To prove that Q’ is true, we begin with 
atomic formulas @ that contain f. In the notation in the first part of the proof 
(translation of formulas), we construct Q* and then Qa) = dax(P(a2,t1,...,tn)A 
Q*(a)). To verify that |Q(.)|¢ = 1, for each point € € M we must find a variation 
€' of € along x for which 
IPlo(é)=1 and |Q*(2)|o(E) = 1. 

We determine 2 from the condition |P(x',t6,... ,t§)|y = 1. The description 
in (c) of the interpretation of f shows that we now have |Q*|4(€’) = |Q|g(€) = 1. 

Thus, truth is preserved in going from @ to Qa): Repeating this procedure, 
we find that Q’ is true for atomic formulas Q. Finally, the truth of Q’ in the 
general case is proved by induction on the number of connectives and quan- 


tifiers. Combining the results (a)—(d), we then obtain €’ + Q’, which which 
completes the proof of Proposition 8.3. 


8.4. EXAMPLES 


(a) In LySet the following formula is deducible from the axioms of extensionality 
and pairing (and also the axioms of equality and the logical axioms): 


laVze(zE€aSz=uVz=v). 
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Using Proposition 8.3, we see that we may add to L;Set a new degree 2 
function symbol {}, “unordered pair,” without changing the set of formulas in 
L,Set that are deducible from the Zermelo—Fraenkel axioms. Therefore, without 
hesitation we may use not only the abbreviated notation “x = {u, w}” as before, 
but also terms that are put together using the symbol {}. In particular (here 
the use of {} is not normalized, but is in agreement with tradition): (b) We can 
introduce notation for the finite ordinals 


@,{G},{O,{O}},... 


as terms in their own right in our language extension, and then embed formal 
arithmetic in formal set theory. 


(c) After deducing the formula 


Fla(“x is an ordinal” A “a is not finite” A “V ordinaly < x,y is finite” ) 


from the Zermelo—Fraenkel axioms, we can introduce a new constant wo, and 
then continue to introduce names of more and more ordinals that are demon- 
strably uniquely characterized by formulas in L;Set (or in language extensions 
that are formed in the same way). 

We shall make use of this new freedom of action in Chapter III. 


9 Undefinability of Truth: The Language SELF 


9.1. When modeled in formal languages, arguments of the “liar paradox” type 
lead to important theorems on the limitations of the modes of expression and 
proof in these languages. The best known of these theorems are Tarski’s theorem 
on the undefinability of the set of true formulas and Gédel’s theorem on the 
impossibility of effectively axiomatizing arithmetic. 

The next three sections are devoted to Tarski’s theorem. Our presentation 
is based on an excellent article by Smullyan (Languages in which self-reference 
is possible, J. Symb. Logic. vol. 22, no. 1 (1957), 55-67). 

In this section we describe the extremely elementary language SELF (which 
does not belong to £1), which was designed to illustrate self-reference and which 
graphically demonstrates the idea of such a construction. In §10 we introduce 
the language SAr, which is just as expressive as L,Ar, but does not belong to 
£,. Its syntax is close to that of SELF, which greatly simplifies proofs. Finally, 
in $11 we use a method of Smullyan to prove Tarski’s theorem for SAr. 


9.2. The language SELF (Smullyan’s Easy Language For self-reference) 

The alphabet of SELF E, * (symmetric quotes), r (relation of degree 1 ), — 
(negation). 

The syntax of SELF. The distinguished expressions are labels, displays, for- 
mulas, and names. The label of any expression P is xPx (“P in quotes”). The 
display of any expression P is P * Px (“something with a label”). Formulas are 
expressions of the form rE... E* Px or arE... E* Px, where E appears k > 0 
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times after r. We use the abbreviated notation rE* * P* and arE* * Px for 
formulas. Finally, we introduce the binary relation “is the name of” on the set 
of all distinguished expressions. This relation is defined recursively: 


(a) The label of P is a name of P. 
(b) If P is a name of Q, then EP is a name of the display of Q, i.e., a name of 
the expression @ * Qx. 


9.3. Remarks 


(a) If P is a name of Q, then the display of Q has at least two different names: 
EP and *Q*Qxx. Thus, an expression can have several names. But con- 
versely, an expression is uniquely determined if we know its name; names 
all have the form E* « Px, k > 0. We shall write N(Q) in place of “one of 
the names of Q.” 

(b) Every formula has the form rN(Q) or arNQ. In 9.4 we interpret such a 
formula as the statement, “The expression Q has (or does not have) the 
property R,” and it is natural that the formula, in saying something about 
Q, “calls Q by name.” 

(c) The expression EF’ * Ex is one of two possible names for itself. In exactly the 
same way, the formula rE x rEx “says something about itself” (see 9.5). 
The language SELF was constructed precisely in order to produce these 
effects of self-reference with the fewest possible modes of expression. 


9.4. The standard interpretations. In order to give one of the standard interpre- 
tations of the language SELF, we choose any set (property) R of expressions of 
the language and introduce the truth function | | on the formulas by stipulating 


1, ifQeER, 
1— |-rN(Q)|z = |rN(Q)|R = / 

0, otherwise. 
We say that a formula is R-true (R-false) if the value of | |r in the formula 
equals 1 (resp. 0). 


9.5. Undefinability Theorem. For any property R, 


R-true formulas, 


pe Saez font 

PROOF. 

(a) The formula Q = 7rE « 7rEx is R-true@ rE * arEx is R-false = Q ¢ R, 
since E * =rEx is a name of the display of arE, i.e., a name of Q. Thus, Q 
cannot both lie in R and be true, which proves the first part of the theorem. The 
connection with the liar paradox becomes clear if we note that Q says about 
itself, “I do not have the property R.” 

(b) Analogously, the formula rE « rE’ x says about itself, “I have the property 
R,” and so cannot both lie in R and be R-false. 
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10 Smullyan’s Language of Arithmetic 


10.1. In this section we describe the language of arithmetic SAr and its 
standard interpretation. The main difference between SAr and L,Ar is that 
in SAr we are allowed to form “class terms”—names of certain sets of natural 
numbers. More precisely, if P(a) is a formula in SAr with one free variable z, 
then the expression x(P(x)) in SAr names the set {x € N|P(zx) is true}, and 
the expression 2(P())k, where the term k is a name for an integer k > 1, is 
a name for the statement “k satisfies P.” The greater richness of the modes of 
expression in SAr, as opposed to L;Ar, does not increase the class of subsets 
in Up>1N” that are definable by formulas. But it brings the syntax of SAr so 
close to that of SELF that we can imitate the proof of Theorem 9.5. 

In addition, the alphabet of SAr is somewhat altered and shortened in 
comparison with the alphabet of L; Ar, but this is done only in order to simplify 
the description of the syntax. These changes do not make the logic of SAr any 
poorer. 


10.2. The alphabet of SAr: x (a variable); ' (used to form a countable set of 
variables x,x ,2”",.. .); - (multiplication, a degree-2 operation); | (raising to a 
power, a degree-2 operation, as in Algol); = (equality); | (a connective, the 
conjunction of negations); (,) (parentheses); and 1 (the constant one). 


10.3. The syntax and interpretation of SAr. Because we are allowed to form the 
class terms 2(P(a)) and the formulas «(P(x))k, the syntax is more complicated 
than in languages of £1. We use induction on the integer i > 0 to define two 
sequences of sets of expressions: Tmg; (terms of rank < 27) and F'lg;+1 (formulas 
of rank < 2i+1). (Using double induction—on the rank of the term or formula, 
and, within the set Tm; or Flg;+1, on the length of the term or formula—one 
can prove a unique reading lemma; this lemma is the basis for defining free 
and bound occurrences of variables and truth functions. However, since there 
is nothing new here beyond what was done in §1, we leave the details to the 
reader.) 

Along with our description of the syntax, we give a parallel description of 
the standard interpretation of SAr in N. In order to ve sou expressions with 
free variables, we must fix a point € € NY = N x N x N x--- , which we shall 
identify with the corresponding infinite vector with natural snes coordinates. 
Here the value of the kth variable (a’"’)§(k — 1 primes) is in the kth place in 
the vector. 

(ao) Tmo is the set of numerical terms i.e., the least set of expressions 


that contains the variables z,z ,2", ... and the names of the natural numbers 
1,11,111,... and is closed with respect to forming the expressions (t1)-(t2) and 
(ti) T (t2), where t; © Tmo. 


Instead of x’’(k — 1 primes) we shall write x,, and instead of 1---1 (k > : 
ones) we shall write k. The term k is interpreted as k (not eee on &); x 
is interpreted as the kth coordinate of €; and if tf, tg € N have already Been 
determined, then [(t,) - (t2)|& = #§¢§ and [(t1) T (t2)|§ = (#8). The occurrences 


72 II Truth and Deducibility 


of the expressions xz = 7/’’ in any term in T'mo are obviously independent of 
one another. All such occurrences are considered free. 

(bo) Fl, is the least set of expressions that contains all expressions of the 
form ty = tz (where t; € Tmo) and is closed with respect to forming the expres- 
sions (P,) | (Pz), where P; € Fl,. In other words, Fl, is the logical closure of 
the set of atomic formulas {t) = ta|t; € Tmo}. 

Choosing a point € determines a truth value for any formula P € Fl, by 
induction on the number of times | occurs: 


1, ifté =¢5; 
tj =t = a 
ts 21(@) i otherwise; 


1, if |[Pla(<) = |Pa|(€) =9, 
0, otherwise. 


(Pr) | (Pa)|(§) = 


All occurrences of variables in elements of F'l, are independent of one another, 
and are considered free. 

Now let 7 > 1, and suppose that the sets T’mo2z—2, Flox_1 are already 
defined for k < i along with the interpretations and the division into free and 
bound occurrences of variables. We define the next sets Tm; and Flg;+1 as 
follows. 

(aj) Tm; consists of the class terms of rank < 22: 


Tmy_2 U {a,(P)|k > 1, P € Flaj_1} 
(Imo need not be included when i = 1). These elements have the following 
interpretation: 
(x,(P))§ = as |e" runs through the variations of € along x, . 
for which |P\(€) =4 


All occurrences of the variable x, in x,(P) are considered bound, and the 
occurrences of other variables remain the same (free or bound) as in P. 
(bj) Flai41 ts the logical closure of the set of expressions 


Flg;_1 U {xz(P) = UR(Q)|k = 1; PQ E Flg;_-1} U {Tk|k Sire Tma;} 


The truth function is defined as follows: if we set x,(P) = T, and x,(Q) = To, 
then 


1, if TS =T§ as subsets of N, 
0, otherwise; 


|vx(P) = rn(Q)|(E) = 


= 1, ifkeT®, 
IT |k(€) = 
0, otherwise. 


The function || is extended to the logical closure in the same way as in (bo). 
All occurrences of variables in x,(P) = x,Q and in Tk are the same (free or 


10 Smullyan’s Language of Arithmetic 73 


bound) as in the corresponding class term. Composition using the connective 
| does not change the nature of the occurrence. As in Section 2.10, one can 
prove that |P|(€) depends only on the €-values of the variables that have free 
occurrences in the formula P € US) Floi41. 

This finishes the description of the syntax and semantics of SAr. 

In conclusion, we show that the classes of sets in U,>1N” that are definable 
by formulas in LAr and in SAr coincide. This result is not used in the proof of 
Tarski’s theorem in the next section. However, the result itself and the method 
of proof are instructive, and we shall return to these ideas in Part III of the 
book. 

Let L;Ar have a countable set of variables. If we denote them by x1, 22,..., 
Xp,-.. and identify x; with x’-'(i — 1 primes), we can also identify the interpre- 
tation classes for L;Ar and SAr in the obvious way. Our claim that the classes 
of definable sets coincide is then an immediate consequence of the following 
stronger fact: 


10.4. Proposition. Two translation mappings 
{formulas of Lj Ar} = {formulas of SAr} 


can be explicitly defined with the following properties: 


(a) At every point € the truth values of any formula and its translation coincide. 
(b) The sets of free variables of any formula and its translation coincide. 


We note that the mappings we define will not be inverse to each other! 


PROOF. 


(a) The translation from Li Ar to SAr. The translation of a formula P will 
be denoted by “P”. We first translate atomic formulas, and then use induction 
on the length. The alphabet of SAr does not have addition, but it has both 
multiplication and raising to a power, so that in place of z = x+y we can write 
27 = 2”. 29, 

(a1) Atomic formulas. They have the form t; = ta. By “carrying out the 
operations,” we replace every nonzero term in L,Ar by a “normalized term,” 
i.e., a polynomial of the form Sait ---g'», where the monomials are written 
in the form (--+ (a1 -@1)-+-+%1)+%2)...), then arranged in lexicographic order, 
and finally separated by parentheses: (--+((m1 + m2) + m3) +---). It is clear 
how to correlate such a term ¢ to the term “2 } ¢” in SAr. For example, “2 
((a1) - (1) + @2)” is (2 f (x1) - (x1)) - (2 T (x2)). By definition, the translation 
“2 + 0” is I. Then we define the translation of the formula tj = tz to be 
“27 ty"= “2 7 to”. It is clear that such a formula and its translation have the 
same variables and are true at the same points €. 

(ag) If “Q”, “Qi”, and “Q2”, have already been defined, then “7Q” is 
defined as “Q” | “Q”. We similarly construct “Qi *Q2” for the other connectives 
(see “Digression: Syntax” in Chapter I). 

(ag) If “Q” has already been defined, then “Vx, Q” is defined as 


az(“Q”) = cea = Te). 
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Both the formula and its translation are true at a point € if and only if Q (and 
“Q”) are true at all variations é of € along x,. They also have the same free 
variables, since by induction, we may assume that this is the case for Q and 
“<Q” : 

(a4) By definition, “Ja,Q” coincides with “aV2,7Q”. 

(b) The translation from SAr to LiAr. As before, we let “P” denote the 
translation of a formula P, although this time P will be a formula in SAr and 
“P” will be a formula in L,Ar. 

There is a subtle point here, namely, how to translate 7; = x2 | x3. It will 
be shown in Part II of the book that such a translation exists, and can even 
be taken in the form a4 ---Say,p(11,%2,%3,24,---,2n), where p is an atomic 
formula in L; Ar. Here we shall take this fact on faith, and choose a translation 
“1 = £2 T 3” once and for all. 

(bi) Translation of formulas in Flo. The following rules give an inductive 
definition: 

“t; = to” has exactly the same form if t1,t2 € {variables} U{1,11,...} 
(of course, in the sense that x’:-’ is replaced by x, and 1---1 is replaced by 
(- ers (i + 1) + 1) te )). “arp = ty ‘ ty” has the form dada; (“x; = ty” /\ he = 
ty” ALE = 24° xj) and “Cie = T ty” has the form dada; (“x; = ty” A“a; = ty” 
At, = «; | £;), where x; and 2; are the first two variables not occurring in ty or 
tg. We similarly translate formulas with the left- and right-hand sides permuted, 
and also with 1---1 instead of x. We further stipulate that “t; = t2” has the 
form Ja;(“x; = t1” A “x; = te”), where 2; is the first variable not occurring in 
t, or tg, and where we assume only that neither ¢; nor tz is a variable or 1---1. 
It is clear that the truth function and the set of free variables are preserved 
under these translations. 

(bz) Suppose that the formulas in Flz;-; have already been translated. Let 


“tp. (P1) = xp (P2)” be Vax (“P,” eS “Py”), and 
“op, (P)n” be “P” (n), 


where on the right n = (---(1+ 1) +1)+---) is substituted in place of all free 
occurrences of x, in “P”. This completes the proof. 


11 Undefinability of Truth: Tarski’s Theorem 


11.1. The language SAr is interpreted in N, and not in the set of its own formulas 
the way SELF is. In order to be able to determine the set of definable formulas, 
we number formulas by (certain) integers as follows. 

We number the symbols of the alphabet (of which there are nine) from 1 to 
9 in any order, as long as 1 corresponds to 9. We then set (here a; € {alphabet 
of SAr} and v(a;) is the number of a;) 


k 
number (a; +++ az) = n(a1,-+- ap) = > v(aj)10*-* + 1. 
i=1 
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In other words, we obtain the number of an expression by replacing all of its 
symbols by the corresponding decimal digits (1 is replaced by 9), then reading 
the resulting number in the decimal system and adding 1. It is clear that an 
expression can be reconstructed in a unique way if we know its number. 

The name in SAr of the number of an expression P, i.e., 1--- 1 (n(P) times), 
is called the label of P. As in SELF, we shall denote the label of P by «Px (but 


now this is abbreviated notation). We call the expression P* Px the display of P. 


11.2. Definition. Let P(x) be a formula in SAr with one free variable x. 

(a) An expression Q satisfies P if the number of Q lies in the set {k|P(k) is 
true}. 

(b) An expression Q is displayed in P if the display of Q satisfies P. 

11.3. Lemma. Let P(x) be as in 11.2. Let P(x) denote the formula P((x) - 

((10) ft (2))) (ae., the term “c10®” is substituted in place of all free occur- 

rences of x). Then the set of expressions satisfying Pg coincides with the set of 

expressions displayed in P. 


Proor. If Q has number k, then the display of Q has number k - 10” (which is 
why 1 has number nine!): 


n(Q*Q*)=n(Q 1---1) 
n(Q)times 


n(Q)times 


Hence, n(Q) satisfies Pg if and only if n(Q * Q*) satisfies P. 


11.4. Theorem. For any formula P(a) as in 11.2, we have 


th t t i 
the set of formulas satisfying P # Eee RE are 
the set of false formulas. 


ProorF. We consider the Tarski-Smullyan formula S : «Pg *« «Pex. According 
to the definitions, we have (recall that «Pg is a class term and *xPgx is the 
name of a number) S' is true = «2Per satisfies Pg = «Pe is displayed in P 
(by Lemma 11.3) the display of «Pg satisfies P = S' satisfies P. Hence, S is 
either not false and satisfies P, or else is false and does not satisfy P. Therefore, 
the set of formulas satisfying P cannot coincide with the set of false formulas. 
As in §9, the formula S says, “I satisfy P.” 
Similarly, the formula 


a((P) | (P))e*2((P) | (P))z" 


says, “I do not satisfy P,” and thus either satisfies P or is true, but not both. 
The theorem is proved. 


11.5. Of course, Lemma 11.3 is pure magic. The decimal system really has 
nothing to do with all this, and 1 did not really have to be number nine, but 
this way everything is much prettier. 
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More generally, let ?Ar be any language of arithmetic with a finite alpha- 
bet containing the alphabet of SAr. Let the rules for forming distinguished 
expressions and the standard interpretation of formulas in ?Ar be an arbitrary 
extension of the rules in SAr. We require only that the terms and formulas 
in SAr keep their earlier meaning, and that for any formula P(a) in ?Ar with 
a free variable x, the expression 2(P(x))k must be a formula in ?Ar and be 
interpreted by the same recipe as in SAr. (For example, we might add to SAr 
the + sign, the connectives, and the quantifiers, and then allow formulas to be 
constructed by the rules of £; as well, thereby embedding LAr in ?Ar.) 

Then the undefinability theorem 11.4 holds for ?Axr. 

We must choose the numbering as follows: if m is the number of elements in 
the alphabet of ?Ar and v is a numbering of the symbols for which v(1) = m, 
then 


k 
n(ay +++ ax) = D2 v(ai)(m ae aT, 


Then, using the same conventions as before, we have 


n(Q*Q«)=n(Q L--1) 
n(Q) times 
n(Q)-1 
= (n(Q) -1)(m+ 1") +m S> (m4+1)441 
j=0 


= n(Q)(m-+ 1", 


Defining P(x) as P((x) - ((m +1) Ff (x))), without any further alterations we 
obtain Lemma 11.3 and Tarski’s theorem for ?Ar. 


11.6. Remarks 


(a) If Tarski’s theorem were not true, and there were a formula P(x) such 
that {Q|Q is a formula and P(n(Q)) is true} coincided with the set of all 
true formulas of arithmetic, then this would mean that all number-theoretic 
questions would reduce to a series of problems all of the same type. In- 
stead of asking, “Is assertion number n true?” we could ask, “Is P(m) true?” 
Although such an all-encompassing problem could still be rather complicated 
(in a certain sense even “infinitely complicated,” see Part III), Tarski’s theorem 
says that arithmetic has much more diversity than could be contained in any 
such single problem. 

(b) We still have reason to suspect that perhaps everything worked out this 
way because we could “cleverly” number the formulas. This is not the case; 
the results in Part II will imply that Tarski’s theorem remains true for any 
numbering in which a formula and its number can be effectively reconstructed 
from one another. 

(c) It is natural to ask whether the set of numbers of provable, or deducible, 
formulas is definable (for some set of axioms and rules of deduction, for example 
in SAr). The answer is yes, this set is definable. We shall give some intuitive 
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considerations in this direction, which anticipate the systematic theory in 
Part III. 

However we define the notion of provability, it is natural to expect it to 
have the following property: there exists an algorithm (for example, a computer 
program) that for any teat of the given language determines whether this text is 
a proof and, if so, of what formula. 

We now write a program that constructs the texts in the language in lexico- 
graphic order, verifies whether each one is a proof, and, when it is, computes the 
number of the formula it proves. Roughly speaking, the graph of the function 
(number of a proof) ++ (number of the formula proved) is definable in LyAr 
because machine logic and arithmetic are embedded in L;Ar. Hence, the set of 
numbers of provable formulas is definable in L,Ar, in SAr, or in any language 
?Ar as in 11.5. 

Combining this discussion with Tarski’s theorem, we obtain the following 
form of Gédel’s theorem: 


11.7. Gdédel’s Incompleteness Theorem for Arithmetic. In any language 
of arithmetic of type ?Ar, and for any definition of deducibility in which the set 
of (numbers of) deducible formulas is definable, 


{true formulas} 4 {deducible formulas}. 


In Part III we discuss more general formulations of this theorem and other 
versions of the proof, and we give a detailed verification of the principle in 
11.6(c) for deductions in L; Ar. 


Digression: Self-Reference 


In natural languages it is only recently that linguists have taken note of the 
so-called “performative” statements. The characteristic feature of such a state- 
ment is self-reference, which can be defined as the ability to “refer to a reality 
that it creates itself, because it is stated under circumstances which make it 
into an act” (E. Benveniste, La Philosophi analytique et le langage, Les Et. 
Philos., No. 1 (1963) 9). Examples of performative statements include, “I 
solemnly swear,” the saying of which constitutes the act of swearing; “I proclaim 
a general mobilization,” and “I appoint you director,” when these two state- 
ments come from an authority that has the power to carry out the respective 
acts. If we look carefully at the semantics of performative statements, we find 
an imperative nuance, even though it is expressed by the declarative mood of 
the verb. 

In this connection, it is interesting to compare the role of self-reference in 
formal and algorithmic languages (see also Section 1.2 of Chapter I). In for- 
mal languages (and, in general, in descriptive languages), self-reference leads to 
logical circles, to paradoxes, or, if we try to avoid logical circles, to demonstra- 
tions of certain inadequacies of the language. On the other hand, in algorithmic 
languages (and in general, in control languages and systems), self-reference is 
the most important device for turning a finite program into a process that is 
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potentially arbitrarily long (“loops”); it takes part in the control instructions 
(feedback), and is among the fundamental possibilities of the system. 

A similar dichotomy can also be found in psychological behavior—compare 
with the distinction between introspection and self-improvement. 

Finally, self-reference can play a role in the genetic causality of aging 
processes (of biological and social systems). A self-regenerating cycle, when 
repeated many times, leads to erosion at the place of generation. 


12 Quantum Logic 


12.1. The last section of this chapter is devoted to certain physical facts and 
to the mathematical constructions that have been developed to describe them. 
In particular, we discuss von Neumann’s theorem that it is impossible to intro- 
duce hidden variables into the quantum-mechanical picture of the world. This 
material, while not completely traditional for a course in logic, is relevant here 
for two reasons. 

In the first place, von Neumann’s theorem is a vivid example of a meta- 
physical assertion. It is concerned with properties of the language, rather than 
with the subatomic world described by the language, and thus is analogous to, 
for example, Tarski’s theorem in metamathematics. This is why it occupies an 
isolated position in physics, and why we are interested in it here. 

In the second place, analyzing quantum-mechanical phenomena reveals 
a profound divergence between the internal logical structures of the macroworld 
and the microworld. Although explanations of these differences by means of nat- 
ural language and natural logic are agonizingly difficult and, in the last analysis, 
always leave one feeling unsatisfied, these attempts to explain continue. The 
development of the foundations of physics in the twentieth century has taught 
us a serious lesson. Creating and understanding these foundations turned out 
to have very little to do with the epistemological abstractions that were of such 
importance to the twentieth-century critics of the foundations of mathematics: 
finiteness, consistency, constructibility, and in general, the Cartesian notion of 
intuitive clarity. Instead, completely unforeseen principles moved into the spot- 
light: complementarity, and a nonclassical, probabilistic truth function. The 
electron is infinite, capricious, and free, and does not at all share our love for 
algorithms. 

The following exposition is based on the article by S$. Kochen and 
E. P. Specker in J. Math. Mech., vol. 17, no. 1 (1967), 59-87. Sections 12.9-12.16 
contain pure algebra and formally do not depend on the preceding semiphysical 
considerations. 


12.2. The atom of orthohelium. We now describe certain characteristics of the 
behavior of the physical system “an atom of orthohelium in the state n = 2, 
1=0,s = 1.” Such a helium atom is in an excited state: its two electrons are 
on the second energy level, and their spin is pointed in the same direction. 
Nevertheless, the state is metastable, because in order to fall to the first energy 
level, the electrons must turn their spins in opposite directions (parahelium); 
this creates a certain stability. 
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Spin is a physical quantity that is expressed in the same units as the 
“angular momentum.” The total spin of our system (in atomic units: h = 27) 
is represented by a unit vector in physical three-dimensional space. As a first 
approximation we may think of it as changing with time but having instanta- 
neous values that can be measured. (The inadequacy of this picture will soon 
be demonstrated.) 

An experiment for the purpose of measuring the instantaneous value of 
the spin of our system could consist in turning on a magnetic field having a 
specified geometry and registering the shift in energy levels (spectral lines) of 
the atom. Each outcome of such an experiment can be precisely interpreted as 
a measurement of the projection of the spin on some axis, which is uniquely 
determined by the geometry of the field. We shall identify these directions with 
points of the unit sphere $7. 

Quantum mechanics makes the following positive assertions concerning mea- 
surements of the spin of orthohelium. The following quantities are measur- 
able: 


(a) the projection s(a,t) of the spin in the direction a € S$? at the moment of 
time 1; 

(b) the lengths |s|(a;,t),i = 1,2,3, of three projections of the spin in three 
pairwise orthogonal directions {a1, 1,03} C S$? (a “frame”) at the time t. 
The predictions concerning the results of these measurements are as follows: 

(c) s(a,t) is a random variable that can take only the values —1, 0, 1. (The 
probabilities of these values can be predicted from the results of the previous 
measurements, but this is not essential for us here.) 

(a) 72_, |s|(ai,t) = 2 for any frame {a1, 02,03} and any t. 


12.3. Attempt at a classical interpretation. This could consist in adopting the 
following hypotheses A and B: 


A. There is a certain space 2 of “hidden variables” or “internal states” of the 
system and a function s(a,t;w), w € Q, such that if the system is in 
the state w at time ¢, then s(a,t;w) is the “true value of the projection of 
the spin on the a-axis” at this moment. 

B. The probabilistic aspect of the predictions in 12.2(c) results from our not 
knowing the exact values of w = w(t), so that for some measure dui(w) we 
have 


mathematical expectation of s(a,t) = ‘ s(a,t;w)du(w), 
Q 

and similarly for |s|. 

Generalizing, we might suppose that 2 does not depend only on the system 
itself but also on the arrangement for measuring the spin; 4. may depend on 
the time, and so on. However, all of these possibilities actually contradict the 
predictions in 12.2(c) for the following startling reason. 


12.4. Proposition (Kochen, Specker). There does not exist a mapping S? > 
{0,1} such that for every frame {a1,a2,a3} this mapping takes the value zero 
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on precisely one of the directions a;. Moreover, it is possible to construct a 
finite system T C S? of 117 points with the following property. For any map- 
ping k : T — {0,1} either there is a frame {a1,a02,a3} CT in which k does 
take the value 0 exactly once, or else there is a pair of perpendicular directions 
{ai,a2} CT on which k equals 0. 


Here we note that adopting both the assertions in 12.2 and the hypotheses 
in 12.3 would allow us to construct such a mapping of the sphere. In fact, it 
would be sufficient to consider 


9? 5 {0,1} :a6 |s|(a, tw) 


for fixed t and w. By 12(c), |s| takes only the values 0 and 1, and by 12(d), it 
takes the value 1 twice and 0 once on any frame {a4, a2, a3}. 

We prove Proposition 12.4 in Sections 12.12—12.15, and now proceed to a 
more systematic study of “quantum logic.” We shall adhere to our customary 
and useful dualism between “language and interpretation,” although these cat- 
egories are much less formalized and are harder to distinguish from each other 
in physics. 


12.5. The language of nonrelativistic quantum mechanics. We have a some- 
what unusual situation in that quantum mechanics does not really have its 
own language. More precisely, to describe a physical system S such as a “free 
electron” or “atom of helium in a magnetic field,” quantum mechanics uses a 
certain fragment of the language of functional analysis, “oriented on describing 
S.” Assuming that the reader is familiar with functional analysis, we shall 
limit ourselves to a glossary of the most frequently used terms. We also give 
some synonyms used by physicists to indicate the “physical sense,” i.e., the 
interpretation, which will be considered separately in our text. 


(a) A separable complex Hilbert space Hg. Here we are also interested in its 
one-dimensional subspaces and its vectors of length one. A synonym for 
the former is the (pure) states, and for the latter is the (normalized) w- 
functions, or, more precisely, the instantaneous values of the ~-functions. 

(b) Unitary representations of R in Hs: t ~ U; = e~*#s'. For synonyms we 
have t + U; is the dynamic group; t is the time; and the infinitesimal 
generator Hg (which is a self-adjoint operator) is the dynamic operator, or 
Hamiltonian, of S. 

(c) Schrédinger equation: Oy,/Ot = —iHsy. It is satisfied by the w-functions 
w, = e~™s*, which evolve with time. 

(d) Self-adjoint operators in Hs. Synonym: the observables of the system. The 
operator Hg is an energy observable. The discrete spectrum of Hg gives us 
the energy levels of S. We shall be especially interested in the orthogonal 
projection observables. Here the pure states Cy C Hs are in one-to-one 
correspondence with the projections Py, onto the corresponding subspace. 

Another important class of projections is constructed using the spectral 
decomposition theorem. Let A = [°<, \dP4(A). Then the projection P4(U) 
is defined for any Borel subset U C R. In the simplest cases its image is 
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spanned by the vectors in Hs that are eigenvectors for A with eigenvalues 
in U. 

Projection observables are also called “questions” (Mackey) or “Eigen- 
schaften” (von Neumann). 

(e) Commuting operators. Synonym: compatible (or simultaneously measur- 
able) observables. For unbounded operators A and B, whose formal com- 
mutator may have an empty domain of definition, we define commutativity 
to mean that P4(U;) and Pg(U2z) commute for all Borel sets Ui, U2 C R. 

(f) Unitary representations inHg of various groups, such as SO(3), SU(2), Sn, 
Synonym: symmetries of the system S (if the representations commute with 
the Hamiltonian Hg), or approximate symmetries (if Hs = Hj) + H;, where 
the representations commute with Ho and Hj, is a “small perturbation” ). 


12.7. EXAMPLE. Let S' be “an electron in the electric field of a proton” (where 
we disregard the motion of the proton, the spin, and the relativistic effects). 
Here Hs = L?(E*) consists of the square integrable complex functions in the 
Euclidean “physical coordinate space of the electron.” 

Hg is the self-adjoint extension of the operator 


h A 1 e? 


where h is Planck’s constant, m is the mass of the electron, e is its charge, and 
r is its distance from the origin (where the proton is). 

The energy levels (the discrete spectrum of Hg) are E, = 
—(2n?me*/h?)/(1/n?),n = 1,2,3,.... The eigenfunctions 7 corresponding to 
the points of this spectrum are the states of an electron in a hydrogen atom. The 
energy level n = 1 corresponds to the unexcited state, and the other values of n 
correspond to excited states. The positive semiaxis is the continuous spectrum 
of Hg; in states with positive electron energy, “the hydrogen atom is ionized.” 

The most important observables of the electron are the operators of multi- 
plication by the three coordinate functions x; (the coordinate observables), 
and the self-adjoint extension of the operators p; = (h/2mi)(O/Ox;) (the 
momentum projection observables). The operators x; and p; do not commute, 
so that the x;-coordinate and the projection of the momentum on the x;-axis 
are not simultaneously measurable. 

The system S is spherically symmetric. The natural representation of SO(3) 
in L?(E*) commutes with Hg. The restriction of this representation to the sub- 
space of 7(g corresponding to the discrete spectrum of Hg in a natural way 
splits into a direct sum of representations corresponding to a given energy level 
E,,. This E,-subspace, in turn, splits into a direct sum of representations of 
SO(3) on spherical polynomials of degree 7 = 0,1,2,...,n—1 with multiplicity 
one. If the ~-function of the electron belongs to the level EL, and the sub- 
space corresponding to the representation of SO(3) on spherical polynomials of 
degree j, we say that n and j are the principal and orbital quantum numbers, 
respectively, of the electron’s state in the hydrogen atom. 


The above text is typical of what might be found in a physics textbook. 
The “language” is mixed with the “metalanguage” that gives the standard 


82 II Truth and Deducibility 


interpretation of the language. We now describe them separately and more 
systematically. 


12.8. The interpretation. A very important aspect of the interpretation that 
we shall not discuss here is the list of informal recipes for choosing Hs, Hs, and 
the observables corresponding to a given system S. These “units of expression” 
are often chosen in two stages: a classical description is chosen, and then the 
“rules of quantization” are applied to it. This procedure might be “approxi- 
mate” in the sense that certain circumstances are not taken into account (such 
as the spin in 12.7). 

Suppose that #5 and Hg have already been chosen. The most character- 
istic peculiarity of the interpretation of quantum language is that it is “two- 
layered.” Part of the mathematical statements are interpreted as assertions 
about a “freely evolving system,” and part are interpreted as assertions about 
the results of observations on this system. 

(a) Freely evolving system. It is generally believed that the system’s w- 
function w, € Hs gives (within the framework of a given approximation) maxi- 
mally complete information about the state of the system at time t. As long as 
no one looks in on the system, 7; evolves as e~*#5*w, starting from the initial 
state wo. (How do we know w9? See Section 12.8(c) below.) 

(b) Observation. Suppose we want to measure the instantaneous value of 
some physical quantity for our system S at the moment t. This quantity 
corresponds to an observable A. (How do we know the form of A? See the 
beginning of 12.8.) For simplicity we suppose that A has a discrete spectrum 
with all multiplicities one. The predictions of what will be observed are as 
follows. 

If Avy = ay, then a will be the value of the observable A at the time t for 
the system S in the state with w-function vy. 

In the general case, let pi = 1,2,..., be an orthonormal basis for 
Hs consisting of eigenvectors for A. We expand y, with respect to this ba- 
sis: db = 7, a (ty. Let Ay? a ap? Then the result of measuring 
A will be a random variable taking the value a; with probability |a© (#)|?. 
(It is easy to see that the mathematical expectation of this random variable 
is (Ady, v). This formula holds for all A. More generally, the probability of A 
falling in a Borel subset U C R is equal to (P4(U) x, U4), where P4(U) was 
defined in 12.5(d).) 

(c) System evolving after observation. With the same assumptions as be- 
fore, the w-function of the system after the observation is determined by the 
result of the observation. If we registered the value a; for A at the time to, 
then, starting from oe at to, S evolves until the next observation completely 
independently of how it evolved before. 

Thus, the result of the observation lets us know the form of the w-function 
after the observation, but it tells us nothing about the @- aa cee the 
observation. Hence, eri often say that registering the value po 4. prepares 
the system in the state pe "A ) at the time to. Another synonym: at the moment 


of observation the w~-function of the system reduces to p®, 
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If we were able simultaneously to register the values of two observables, then 
we would prepare the system with a w-function that is an eigenfunction for both 
observables. Since noncommuting observables always have different eigenvec- 
tors, in general the values of such variables are not simultaneously measurable. 


12.9 Quantum logic. We now investigate the algebraic framework of quantum 
logic. We start with the following analogous situation. 

Suppose we are given a formal language in £; having one variable and an 
interpretation of this language in a set M where this variable takes values. 
Then we can distinguish the Boolean algebra B of definable sets in M (see 
§3). The conjunction of formulas corresponds to the Boolean intersection of 
the sets that define them, and so on. By definition, N € B if we can ask in the 
language, “Does the value of the variable belong to N?” The algebra B is the 
most important invariant of the pair {language, interpretation}. 

We now consider the language of quantum mechanics, oriented on describ- 
ing a system S. We shall exclude the time aspect by fixing a moment of time 
to which all statements about the state of the system refer. Then the “state 
of the system” will be the only variable in the language. It takes values in 
the set of lines in the Hilbert space 7g. The only questions to which we can 
give a yes or no answer are those of the form; “Does the state of the system 
belong to a given closed subspace of 7s?” It is the closed subspaces of Hs 
that form the analogy of the Boolean algebra B. The conjunction of questions 
corresponds to the intersection of subspaces, and the disjunction corresponds to 
their sum, but both operations can be performed only when the corresponding 
projection observables commute. Only in this case are the Boolean identities 
fulfilled. 

We axiomatize the situation as follows: 


12.10. Definition. A partial Boolean algebra is a set B together with the fol- 
lowing structures on B: 


(a) A reflexive and symmetric binary relation « called “compatible measura- 
bility.” Instead of (a,b) € * we write a * b. 

(b) Partial binary operations V and A and a unary operation ’. 

(c) Two elements 0 and 1 € B. 


These structures must satisfy the following axioms: 


(d) The relation « is closed with respect A, V, and 's if ay, a2, and ag are pairwise 
compatibly measurable, then (a; A a2) * a3, (a1 V a2) * a3, and a; * a3; in 
addition, a x0 and a1 for alla € B. 

(e) If a1, a2, and a3 are pairwise compatibly measurable, then together with 
0 and 1 they generate a Boolean algebra relative to the operations V,/, 
and . 


12.11. EXAMPLE. Let 1 be a Hilbert space (possibly real and finite-dimensional). 
The partial Boolean algebra B(H) is defined as the set of closed subspaces of 
H with the following structures: 
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(a) a * b if and only if there exist three pairwise orthogonal closed subspaces 
c,d,e € H such that a = cGd and b = e@d. The motivation for this defini- 
tion is that this condition is equivalent to commutativity of the projections 
onto a and b. 

(b) aA b = the intersection of a and b. 

(c) aV b= the sum of a and b. 

(d) a = the orthogonal complement of a. 

(ec) 0O= {0} and1=H. 


One form for the theorem that there are no hidden variables is as follows. 
12.12. Theorem: Jf dimH > 3, then B(H) cannot be embedded in a Boolean 
algebra in such a way that the operations are preserved. 

This result can be strengthened formally in various ways: see 85 of Kochen 
and Specker, and also N. Fierier, M. Schlessinger, Duke Math. J., vol. 32, no. 2 
(1965), 251-262. We shall not dwell on this here. 


PRooF. We choose a real Euclidean space E? C H and show that even B(E?) 
cannot be embedded in a Boolean algebra. Otherwise there would exist a homo- 
morphism of the partial Boolean algebra B(E?) onto the two-element Boolean 
algebra {0,1}, since for any pair of elements in any Boolean algebra, there exists 
a homomorphism onto {0,1} that separates them. 

Let h be such a homomorphism. If a,,a2,a3 € E® are pairwise orthogonal 
lines, then h(a; A aj) = h(a) A (a;) = 0 for 2 # j. Hence, in any pair of 
orthogonal lines, at least one of the pair must go to 0 under h. Furthermore, 
h(a, V a2Va3) = h(a1) Vh(a2) Vh(a3) = h(E?) = 1. Hence, in any frame exactly 
one of the lines goes to 1. 

If we map the points of the unit sphere S$? onto the lines joining them to 
the origin and then apply h, we obtain a mapping of S$? with the property in 
Proposition 12.4 (where we have only to switch the roles of 0 and 1). We prove 
that no such map exists even on a certain subset consisting of 117 points on $?. 
The latter stronger result is combinatorially elegant and physically meaningful: 
a physicist might raise objections to asking to be able to measure the projection 
of the spin of orthohelium simultaneously in all directions, independently of the 
question whether hidden variables are possible. In fact, we need only finitely 
many directions to show the futility of such an attempted measurement. 

Consider a finite graph. By a realization of the graph on S? we mean any 
embedding of the set of its vertices in 9% for which the distance between the 
endpoints of any edge equals 90°. 


12.13. Lemma. Let a and 3 be points on S? such that the sine of the angle 
between them € (0, 3]. Then there exists a realization of the following graph 11 
in which ag goes to a and ag goes to B. 

ProoF. Let %, 7,27 be a triple of pairwise orthogonal vectors on S?. We take as 
to Z and ag to Z. For certain €,7 € R (to be chosen later), we set 


yt & r+ ny 
Re 


eT ee ag Vine 
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ag 
Then the images of ag and a4 are determined up to a sign by the property of 
being orthogonal to (a1,a5) and (a2, a6), and we choose 


cy 2 nx —Y 
Re 


We similarly set 


Enz — €y +2 E+ ny + Enz 
I a Ql a 
/1 + €2 + €n? /1 4+? + £17? 


and finally, ag and ag are determined up to sign. The sine of the angle between 
dg and ag is easy to compute: it equals 


fq eet Sy ay ey): 


As € and 7 vary, this expression takes on all values in [0, a]. 


12.14. Lemma. Consider the graph V2 that is obtained from Figure 1 by iden- 
tifying the vertices a = po,b = qo, and c= 1 (the apparent intersections of the 
edges inside the circle are not vertices). This graph is realized on S?. 


ProoF. For0 <k < 4 set 


Th | _ th | 
> —-r+ —- 
Pk + COS 10 z+ sin 10 Y, 

Th | . wh | 

. > —-yt —-: 
dk COS 10 y+sin 10 Zz. 
_ wh | é Th | 
Th > sin 10 z+ cos 10 Zz 


Since sin(7/10) < 4, we can first extend this map to a realization of the sub- 
graph between the points po,pi, and rg using the preceding lemma. Rotating 
the resulting realization around 19 so as to take (po, p1) to (pi, p2), (p2,p3),---; 
we obtain a realization of the “lower arc” and ro. By similarly rotating around 
the images of po and qo, we obtain a realization of the other two arcs as well. 


12.15. END OF THE PROOF OF PROPOSITION 12.4 AND THEOREM 12.12. 
Consider an arbitrary map k of the vertices of the graph T2 to {0, 1}. Sup- 
pose that exactly one vertex in each triangle goes to 1 and at least one of the 
two vertices on each edge goes to 0. In the triangle {po, 10, Go} suppose that po 
goes to 1. We consider the copy of the graph [; between the vertices po, 10, 
and p;, which we identify with ao,ag, and ag, respectively. 
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Figure 1. 


We must have k(p,) = k(a9) = 1. In fact, if we had k(ag) = 0, then we 
would also have k(a7) = 1, and then k(a1) = k(a2) = k(a3) = k(a4) = 0, and 
k(a5) = k(ag) = 1, which is a contradiction. 

We now return to Tz. Since k(po) = k(pi) = 1, we similarly find that 
k(p2) = 1, and then k(p3) = k(pa) = k(qo) = 1. But k(qo) = 1 contradicts the 
fact that k(po) = 1. This completes the proof. 


12.6. Quantum tautologies. This theme has been largely neglected. We give a 
counterexample due to Kochen and Specker and formulate some recent results 
of Gelfand and Ponomarev. 

(a) Countererample. This consists of the following: it is possible to give 
a logical polynomial in 117 variables that represents a classical tautology 
but that is defined and takes the value 0 in the partial Boolean algebra 
B(E*) for some values of the variables. This is simply another aspect of the 
impossibility of embedding B(H3) in a Boolean algebra. 

In fact, let P(p,q,1r) be a logical polynomial in three variables that takes 
the truth value 1 when exactly one of |p|, |g|, and |r| is 1. We may assume that 
only the connectives V,/A and — occur in P. Similarly, let Q(p,q) = 7p V 79. 
Then Q takes the value 1 when at least one of |p|, |q| is 0. We index the vertices 
of I’ from 1 to 117 and set 


R(pi, tee P17) = -( \ P(pi, Pj, Dk) \ Qpe.Ps)) : 
{i,j,k} {r,s} 


The first A is taken over all triples {i,j,k} corresponding to triangles in Ts, 
and the second /\ is taken over all pairs {r,s} corresponding to edges. The 
argument in 12.15 shows that for any mapping {p1,...,pi17} — {0,1} at 
least one of the Boolean factors takes the value 0. Hence R is a classical 
tautology. 
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But if we substitute for p; the line from the origin to the image of the ith 
vertex in a fixed realization of C2, then we obtain for the value of R the element 
0 € B(E?). In fact, if p, and p, are orthogonal, then Dp Vv De = E, Similarly, 
if pi, pj, and pz are orthogonal, then P(p;,pj;,px) = 1 € B(E%). The latter 
assertion is verified as follows: if we set 


a+b=(aAb)V(a Ab), 
then we may take 
P(p,ar)=ptaqtr+pAqar 


for any arrangement of parentheses on the right), so that 
g & 


P(pi, Dj, Dk) = pi ® pj © pe = E®. 


(b) Results of Gelfand and Ponomarev. We start with the following obser- 
vation. The operations A,V, and ‘ are actually defined everywhere on the set 
B(H) of closed subspaces of the Hilbert space H, although they do not satisfy 
the Boolean axioms, and if we ignore the compatible measurability relation *, 
it seems as if they no longer have physical meaning. 

Nevertheless, it is also natural to investigate these structures, which were 
first introduced into the logic of quantum mechanics by G. Birkhoff and J. 
von Neumann (Annals of Math. vol. 37 (1936), 823-843). Here is how these 
structures are axiomatized: 


Definition. A modular structure L is a set with binary operations A and V that 
satisfy the following conditions: 

(a) A and V are associative and commutative; 

(b) aAa=aVa=afor alla€c L; 

(c) If aAb=b, then (aVc) Ab=bV (cA b) (the “modular identity” ). 


Birkhoff and von Neumann also require an “orthogonal complement” operation 
to exist with the usual axioms, but we shall omit this here. 

We note that the modular identity is fulfilled universally in B(H) only if 
H is finite-dimensional. It is also fulfilled for triples a, b,c whose elements have 
finite-dimension or codimension in H. 

I. M. Gelfand and V. A. Ponomarev (Uspehi mat. nauk, vol. XXIX (1974), 
No. 6 (180), 3-58) have studied the linear representations of free modular struc- 
tures with r generators in B(H) for finite-dimensional spaces over arbitrary 
fields. Such a representation is called indecomposable if it does not split into a 
direct sum of representations in B(H,) ® B(H2). 


Definition. A modular question is an element of a free modular structure 
that takes the value 0 or 1 for any indecomposable finite-dimensional 
representation. 

One of the main results of Gelfand and Ponomarev is the construction of a 
very nontrivial countable series of modular questions. We shall only formulate 
these results here. 

Let L” be a free modular structure with n generators {a,...,@,}. We 
set I = {1,...,n}. A sequence a = (%,...,%) of length | > 1 of elements 
of I is called admissible if it does not have any identical neighboring entries. 
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A sequence 3 = (ky,..., ki-1) of length /—1 of elements of J is called subordinate 
to a if it is admissible and if V; <1—1,k; ¢ {i;,ij;41}. For admissible a we 
inductively define 

Aa = A, .-- Ai, = ai, \ (Vaas), 


where 7 runs through all sequences subordinate to a. Further, for t € {1,...,n} 


we define 
A; (I) = VV ada; 


where a runs through all admissible sequences of length / with last entry tf. 
Finally, we set 


Hi(l) = \V 5 44A;(0)- 


The substructure in L” generated by the elements Hy(l),...,Hn(l) consists 
entirely of modular questions for all 1 > 1. 

This is a difficult result. It is relatively easy to prove that this substructure 
is a Boolean algebra consisting of 2” elements. If we substitute the elements 
in this Boolean algebra for the variables in the usual Boolean tautologies, we 
obtain “quantum tautologies,” but to see this we must consider structures with 
complements. 

It is not yet clear whether this algebra leads to nontrivial physics. Perhaps 
one should combine it with the techniques in the representation theory of sym- 
metry groups. 


12.17. The orthohelium atom revisited. In conclusion, we return to the 
orthohelium atom S and show how the material in 12.2 looks from a more 
general vantage point. 

(a) Choice of Hg. As explained in 12.7, an electron without spin corresponds 
to the space L?(E°). If we want to take the spin into account, we must introduce 
a “two-component” ~-function, i.e., use the space L?(E*) @ C?. The system of 
two electrons in helium is described by ~-functions in the tensor square of this 
space. However, by Pauli’s principle, the ~-function of this system must behave 
antisymmetrically when the electrons corresponding to the two parts of the 
tensor square are permuted. Hence, we finally obtain Hs = A?(L?(E*) @ C?). 

(b) Choice of Hg. This is a difficult problem, because each electron moves in 
the variable electromagnetic field created by the nucleus and the other electron. 
The principal term in the Hamiltonian corresponds to the spherically symmetric 
constant potential obtained by averaging over time. The remainder is treated 
as a small perturbation. We give the approximate form of the w-function of 
orthohelium, more precisely, of the element in A?(L?(E3)) corresponding to the 
projection of Hs onto the subspace of the unit projection of the spin: 


pee Mrrtra) + [(Cy + Co(ri + r2)+ Carie+ Csria(rit ra) sinh Co(ri — 12) 
+ (r4 — ro)(C3 + Coriz) cosh Co(r1 — r2))]; 


where rj = (4 ae A 1g = (oper (ty — %9;)?)!/?, and the con- 
stants k,Ci,...,Cg are found experimentally. (E. U. Condon and 
G. H. Shortley, The Theory of Atomic Spectra, Cambridge University Press, 


London, 1935.) 


The von Neumann Universe 89 


(c) Approximate symmetries. The group SU(2) acts on the space Hs: on 
L?(E°) through the quotient group SO(3), and on C? by the standard rep- 
resentation. This is the group of approximate symmetries of the system. The 
w-function of orthohelium is “not too far” from the subspace corresponding to 
a suitable representation of SU(2), so we may speak of the principal (n), orbital 
(7), and other quantum numbers of the state, as in the case of a hydrogen atom. 

(d) Spin. The total angular momentum operator JZ commutes with the 
Hamiltonian Hg. In the state n = 2 and j = 1, its eigenvalue is 2 (in atomic 
units). The eigensubspace N C Hg corresponding to this eigenvalue is three- 
dimensional. Further, the squared spin projection operators 72, ie J come 
mute in pairs (this is a peculiarity of spin 1). Letting P denote the projection 
of Hg onto N, we are then able to embed the partial Boolean algebra B(E*) in 
B(Hs) by letting a line a C E® correspond to the image in Hs of the operator 
PJ?. This takes the place of the somewhat naive picture in 12.2. 


Appendix: The Von Neumann Universe 


1. The premises of “naive” Cantorian set theory reduce to the following: a 
set may consist of any distinguishable elements (of the physical or intellectual 
world); a set is uniquely determined by its elements, and any property deter- 
mines a set, namely, the set of objects that have this property. 

However, the formal language of set theory LiSet was introduced in order 
to describe a more restricted class of sets (a universe). Part of these restrictions 
come from considerations of convenience, and part come from the desire to avoid 
the so-called paradoxes. This gives an “upper bound” for our classes. We give 
a “lower bound” by asking that the class of sets be closed with respect to all 
mathematical constructions needed for certain (ideally, “all”) parts of intuitive 
mathematics. 


2. Following Zermelo, von Neumann, and others, we consider two basic restric- 
tions on sets. 

(a) All elements of sets must themselves be sets. In particular, since any 
chain Xp € X1 € X2 €--- in the von Neumann universe V must terminate (see 
below), it follows that the last element in such a chain must be the empty set. 
Thus, all the sets in V are constructed “from nothing.” 

(b) The assumption that every collection of sets, even sets as in (a), is 
again a set in V, immediately leads to contradictions (Burali—Forti, Russell, 
and others). In particular, the collection of all sets in the universe is not itself 
an element of V. Hence, we must give a sufficiently complete description of 
which operations do not take us outside of V. The two basic formal languages 
of set theory—that of Godel-Bernays and that of Zermelo—Fraenkel—differ in 
the choice of objects over which the variable symbols are to range under the 
standard interpretation of the language in V. In the Zermelo—Fraenkel language 
(our L;Set), they range over the sets in V. In the Gédel—Bernays language, they 
name classes (collections of sets in V) that “are not necessarily sets,” and the 
property of “being a set” is specifically defined as the property of “being an 
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element of another class.” The G6del-Bernays language is studied in Chapter 
4 of Mendelson’s book. 

In this section we describe the von Neumann universe using the customary 
terminology of intuitive mathematics. The relationship of this construction to 
formalism will be discussed in Section 18. 


3. The first levels. The von Neumann universe is constructed inductively, start- 
ing from the empty set, by successively applying the “set of all subsets” or 
“power set” operation P. In this way, 


Vo = ©, 
Vi = P{O} = {8}, 
V2 = P(Vi) = {2, {SF}, 


Vai = P(Vn); 


It is easy to see that V, C V,,41 (later this will be proved in complete generality). 


The level V,, consists of 
i] 


27 (n — 1 twos) 


finite sets, whose elements are also finite sets, and so on. 
We cannot go beyond finite sets unless we regard all the V, as “already 
constructed” and apply P to the union of the V,,. We set 


The indices that we now use for the levels are the names of the first infinite 
ordinals. This remarkable idea of transfinite iteration of such constructions is 
due to Cantor, who first applied it to study trigonometric series, and then 
investigated it systematically, finding in it the key to the infinite. 

In the next two subsections our sets will temporarily be Cantorian sets. 
We shall return to V after developing some properties of ordinals. 


4. Ordinals. Let X be any set on which we are given a binary relation <. 
We consider the following properties of this relation: 


(a) Y £Y for all Y € X; if Yi < Yo and Y2 < Y3, then Y, < Y3. 
(b) For any Y,Z © X, either Y <ZorZ</Y, or else Y = Z. 
(c) Every nonempty subset of X has a least element (in the sense of <). 
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The relation < is a partial ordering of X if it satisfies (a), a linear ordering 
of X if it satisfies (a) and (b), and a well-ordering of X if it satisfies all three 
conditions (a), (b), and (c). 

Let (X,<) be a well-ordering. The initial segment Y determined by an 
element Y € X is the well-ordered set (Z,<), where Z = {Y |Y' < Y}. As is 
customary when speaking about a well-ordered set, we shall omit the explicit 
indication of the ordering if it is clear from the context. 


5. Lemma. Let X and Y be two well-ordered sets. Then exactly one of the 
following alternatives holds: 


(a) X and Y are isomorphic. 
(b) X is isomorphic to an initial segment in Y. 
(c) Y is isomorphic to an initial segment in X. 


In each case the isomorphism is uniquely determined. 


ProorF. We divide the argument into several steps. 


(a) Let X be well-ordered, and let f: X — X be a monotonic map, i-e., 
Zi < Zo => f(Z&) < f(Z). Then for all Z € X we have f(Z) > Z. In 
fact, among the elements not having this property there would have to be 
a least element Zo. But f(Zo) < Zo and the monotonicity of f imply that 
F(f(Zo)) < f(Zo), so that we would have an even smaller element in the set of 
elements not having the desired property. 

(b) Therefore X is not isomorphic to any of its initial segments X,: if 
f xs nee then f(X1) < Xj. 

(c) Now let X and Y be well-ordered. We set f = {(X1, Yi)|X1 © X,Y EY, 
and there exists an isomorphism of X, with Yi}. First of all, f is the graph of a 
one-to-one mapping of pr; f onto pr2f. In fact, if X; # X2, say X1 < X2, then 
by (b), X is not isomorphic to Xa; by symmetry, the same holds for f~!. It is 
also clear from this that f and f~' are monotonic. Further, if X,; € pr; f and 
X2 < Xy, then X2 © pr) f and similarly for prof. Finally, we show that either 
prif = X, or else prof = Y. Otherwise, there would exist a minimal element 
X, in X\ prif and a minimal element Y; in Y \ prof. But by the preceding 
paragraph, f induces an isomorphism of X, with Y4. By the definition of f, we 
then have (X1, Yi) € f, a contradiction. 

(d) All of this means that either f is an isomorphism (more precisely, the 
graph of an isomorphism) of the set X onto Y or an initial segment in Y, 
or else f~! is an isomorphism of Y onto X or an initial segment of X. It 
is clear from the definition of f that the graph of any other isomorphism 
must be contained in the graph of f, so we have uniqueness. The lemma is 
proved. 


As a preliminary definition, we can now consider the class of all well-ordered 
sets isomorphic to some fixed totally ordered set X, and call that class an ordi- 
nal. Two ordinals a and £ satisfy the relation a = B,a < 3, or a > 6 depending 
on which of the alternatives in Lemma 5 holds for representatives X € a and 
Y € BG (this obviously does not depend on the choice of representatives). 
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The next step is, naturally, to consider “all” ordinals as a class and show that 
< induces a well-ordering on this class, thereby giving a universal well-ordering. 
However, an unnecessary difficulty arises here: the class of well-ordered sets iso- 
morphic to a fixed X is extremely large, and so the class of ordinals must be 
a “class of classes,” which needlessly complicates matters. An elegant techni- 
cal discovery, due to von Neumann, removes this difficulty: instead of a vast 
number of possible orderings imposed on X from outside, we consider a single 
relation given by internal properties. Recall that a set X is transitive if Z © X 
whenever Z € Y € X for some Y. 


6. Definition. An ordinal is a transitive set X of sets that is well-ordered by 
the relation € between its elements. 


7. Theorem. 


(a) The class of ordinals On is well-ordered by the relation a € 3 (which we 
shall also write a < £3). 

(b) Any well-ordered set is isomorphic to a unique ordinal a, and also to a 
unique initial segment of ordinals (those less than aU {a}). 


PROOF. 


(a) We must verify conditions (a), (b), and (c) of Section 4. The first of 
them follows immediately from the definition. 

To prove the second condition, we consider two ordinals a and 3. By Lemma 
5, there exists an isomorphism f of one of them, say a, onto either ( or an initial 
segment of @. We show that then a = @ or a € (. To do this, we prove that 
f(y) = 7¥ for all y € a. In fact, if y, is the minimal element with f(y) 4 1, 
then f(72) = (92) for all y2 € y1. Since f is an isomorphic embedding of 
a with respect to the ordering €, and since 7, and f(y) are sets, we have 
fly) = {f(r2) v2 € 1} = {92/192 € 11} = M1, which contradicts the choice of 
71. The same argument shows that f(a) = a, from which the condition follows. 

Finally, let C be a nonempty class of ordinals, and let a € C. If a is not the 
least element in C’, then the least element in the intersection aM C' will be the 
least element in C. 

(b) Let X be a well-ordered set. Let S denote the set of ordinals that are 
isomorphic to some initial segment in X. S is nonempty, since, for example, 
the ordinal {@} is isomorphic to the segment consisting of the least element of 
X. It is easy to see that the set 6 = Ugega is an ordinal. We claim that ( is 
isomorphic to X. In fact, if this were not the case, then @ would be isomorphic 
to an initial segment in X, say X,. But then the ordinals GU {3}, which is larger 
than 6, would be isomorphic to the initial segment Pea {Xi}, contradicting 
the definition of (. 


We now give the elementary properties of ordinals. 


8. (a) The finite ordinals are the “natural numbers” (and zero) in the first levels 
of the universe V. Thus, we shall write 


0=2, 1= {2}, 2={2,{o}}, 3 = {9, {2}, {G, {o}}},.... 
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(b) The ordinal that immediately follows a given a is a U {a}. It is also 
denoted by a+ 1, which agrees with the notation in (a) in the case of finite a. 

(c) An ordinal is called a limit ordinal if @ 4 © and 8 4. a+ 1 for any a. 
The first limit ordinal wo is isomorphic as a totally ordered set to {0, 1, 2, 3,.. 
.,n,... }.Ifq@ isa limit ordinal, then a = eee G3. The converse is also true. 

Ordinals are mainly used for three purposes: proofs using (transfinite) in- 
duction, constructions using (transfinite) recursion, and measuring cardinalities. 
Here are the basic principles. 


9. Transfinite induction. Let C' be a class of ordinals for which 


(a) BEC. 
(b) fae C,thena+1eEc. 
(c) If a set of ordinals {a;} is contained as a subset in C, then Ua; € C. 


Then C contains all ordinals. 


In fact, otherwise there would exist a least ordinal not in C, but this could 
not be the empty set by (a), a limit ordinal by (c), or any other ordinal by (b). 
In concrete applications, the verifications of (a) and (c) are often trivial and are 
omitted. 


10. Transfinite recursion. Let G be a function of sets (it will actually be sufficient 
to assume that G is defined on all sets in the universe) whose values are sets. 
Then there exists a unique function F’ on the ordinals such that 


F(a) = G(the set of values of F' on the elements of a). 


In fact, this equality uniquely determines F'(0) = G(@), and then F(1) = 
G({F(0)}), F(2) = G({F (0), F(1)}), and so on. Thus, if we consider the class C 
of ordinals a for which we can define F' with the required property on the initial 
segment of ordinals < a, then C satisfies the conditions 9(a)—(c), and therefore 
contains all the ordinals. Uniqueness follows similarly (if F 4 F’, consider the 
least a with F(a) 4 F (a)). 


11. Measuring cardinalities. Different ordinals can have the same cardinality. 
For example, all the ordinals wo, wo +1,wW9+2,... (and many more after them!) 
are countable. However, jumps in cardinality occur arbitrarily far out. 

An ordinal that does not have the same cardinality as any lower ordinal is 
called a cardinal. All finite ordinals and wp are cardinals. Clearly, any infinite 
cardinal is a limit ordinal. Further, any set has the same cardinality as some 
cardinal, and in fact, a unique one (see §1 of Chapter III). The infinite cardinals 
form a totally ordered class, which is naturally indexed by ordinals. Thus 

wo = the first countable ordinal, 
w, = the first ordinal of cardinality > wo 

= the set of all finite and countable ordinals, 
Ww = the first ordinal of cardinality > w, 

= the set of all ordinals of cardinality < w, 


and so on. 
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We can now give our fundamental definition. 


12. Definition. The (von Neumann) universe V is the class of sets U aconVa; 
where the set V, is defined by the following transfinite recursion: 


Vo = 2, 
Vat = P(Va); 
Va = (J e<aVa, if a is a limit ordinal. 


We give some elementary properties of the universe V. 


13. Each of the sets V. is transitive: if Y © X € Vy then Y € Vy. (In other 
words, Ve c Va41-) 

Suppose that this were not true. Then there would exist a least ordinal a 
with V, Z Va+i, where a > 2. If a is not a limiting ordinal, a = 6+1,Y € 
X €V,, and Y ¢ Vy, then we obtain a contradiction as follows: X € Vg4i = 
P(Vs) > X CVge>Y EVe=>Y € Ve4i = Va, since for @ it is still true that 
Ve C Veg+41 by our choice of a. If a is a limit ordinal, the argument is analogous 
(find y<a with Y ¢ X €V, and Y ¢ V4). 

We define the rank of any set X € K as follows: rank X = a if a is the least 
ordinal such that X € Va4i. If Y e X, then rank X > rank Y +1. 


14. All ordinals belong to V, and rank a =a. 

We first show that a € Vq+1 for all ordinals a. This is true for a = 0. Suppose 
that a is the least ordinal with a ¢ Va41. Ifa = 64+1, then 8 € Vg4i, so that 
GB and {8} € Ve42 = P(Vp41), and hence a= 8+1=6U{G} € Vee = Vaqi, 
a contradiction. On the other hand, if a is a limit ordinal, then a = Ug<a and 
B € Vas C Va by the choice of a, so that a = Ugea 8 C UsgcaVe = Va, and 
a € P(V.) = Va4i, a contradiction. Therefore, rank a < a. We similarly prove 
strict equality. 


15. The universe V is closed with respect to the standard set operations: dif- 
ference, union, intersection, forming P(X) and UyexY, and “collecting” sets 
indexed by any set: {Xy|Y € Z}. In particular, if X,Y € Va, then the pair 
{X,Y} is in Vo4i1. We write {X} in place of {X, X}. 


16. Direct products, relations, and functions can also be defined as elements 
of V using a device of Kuratowski. The intuitive notion of an ordered pair of 
sets X,Y € V is realized by means of the set 


(X,Y) = {{X},{X, Y}} € V. 


As elements of V, ordered pairs are characterized by the following properties: 
an ordered pair is a set of two elements X’ and Y’, one of which is a subset of 
the other (say X’ C Y’); if X’ C Y’, then X’ = {X} is a one-element set, and 
X is called the first term of the pair; Y' is a set of at most two elements, and its 
element Y that is different from X (if it exists) or X itself (otherwise) is called 
the second term of the pair. Thus, (X,Y) = (X",Y”) if and only if X = X” 
and Y = Y”, which justifies the name “ordered pair.” 
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We emphasize that this definition is introduced so that the direct product 
construction does not leave the universe V, and so that a set corresponding to a 
direct product can be described in terms of the relation ©, i.e., in the language 
LSet. 

An ordered n-tuple of sets is defined as 


(X1,.--,Xn) = (++ ((Xy, Xo), X3) +++). 
We define the direct product of two sets as 
XxY={(U,W)|U © X,W EY}. 


Similarly, 
XyX+++ X Xp = (+--+ ((X & XQ) x X3) X +++). 


We note that in general, (X x Y) x Z # X x (Y x Z); we have only a canonical 
one-to-one correspondence between these two sets. But it is usually harmless to 
take the liberty of identifying the two sets and writing X x Y x Z. 

A binary relation (or correspondence) r is a set (or class) all of whose 
elements are ordered pairs. If r € V is a relation, then its domain of defini- 
tion dom(r) is the class of all first terms in the elements of r, and the range of 
values rng(r) is the class of all second terms. 

A function is a binary relation in which each element is uniquely determined 
by its first term. Thus, functions that are maps of sets in V are identified with 
their graphs. If f is a function, we often write W = f(U) instead of (U,W) € f. 
In addition, we set 


f-'(X) = {YIF(Y) € X}, 
flx ={(U,W) € flU € X}. 


A family {Xy|Y © Z} as an element of V is defined to be a function con- 
sisting of pairs {(Y, Xy)|Y € Z}, and so on. 

We again emphasize that the most important feature of these definitions is 
that we do not introduce any new objects besides elements of V, or any new 
relations other than those expressible in terms of €. It should also be noted that 
in accordance with the usual (“extensional”) notion, a property of the elements 
of aset X € V isasubset Y C X (consisting of all elements with this property). 
Thus, Y € V, so that properties, properties of properties, properties of sets of 
properties, . . . (with transfinite iteration) are elements of V. 

The “universe” V has earned its name. 


17. Finally, we show that a chain X, € X2 € -:- of elements of V must 
terminate (of course, with the empty set). 

We prove that if X is nonempty, then there existsa Y € X with YNX =@ 
(the desired result is obtained if we apply this to the set X of terms in the 
chain). In fact, let Y be the element of least rank in X (which exists because 
the ranks, since they are ordinals, are well-ordered). If we had XNY = @, then 
any element Z € X 1 Y would have lower rank than Y, a contradiction. 
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18. Connection with the axioms of L,Set. The point of view adopted in this 
book is as follows. 

The intuitive notion of a set, to which we appealed when constructing the 
universe V, is the primary material. The language L;Set was devised in order 
to write formal texts based on this material that are equivalent to our intuitive 
arguments concerning V. The axioms of L,Set (including the logical axioms) 
are obtained as a result of analyzing intuitive proofs. Our criterion for the 
completeness of this list is that we can write a formal deduction that translates 
any intuitive proof. The fact that we are able to do this must be proved by a 
rather large compendium of formal texts, which can be found in other books on 
logic. In particular, in L1Set we can write the formula “Vx dordinal a(x € V.)” 
and deduce it from the axioms. This formula is the formal expression of our 
restriction to sets in V. 

The question of the formal consistency of the Zermelo—Fraenkel axioms must 
remain a matter of faith, unless and until a formal inconsistency is demon- 
strated. So far all the proofs that have been based on these axioms have never 
led to a contradiction; rather, they have opened up before us the rich world of 
classical and modern mathematics. This world has a certain reality and life of 
its own, which depends little on the formalisms called upon to describe it. 

The discovery of a contradiction in any of various formalisms, even if it 
should occur, would merely serve to clarify, refine, and perhaps reconstruct 
certain of our ideas, as has happened several times in the past, but would not 
lead to their downfall. 


The Last Digression. Truth as Value and Duty: Lessons of 
Mathematics. 


1. Introduction. Imagine that you open your morning newspaper and read 
the following report: 


Brownsville, AR. A local object partially immersed in a liquid was buoyed 
upward Tuesday by a force equal to the weight of the liquid displaced by that 
object, witnesses at the scene reported. As of press time, the object is still main- 
taining positive buoyancy. 

In fact, I did read this report in the Onion'; I have abridged it only to add 
a Fénéonian touch. 

If this book had been dedicated to the nature of the comical, one could have 
produced an interesting analysis of the clever silliness of this parody. But since 
we are preoccupied with mathematical truth, I will use it in order to illustrate 
the differences between the attitudes to truth among practitioners of social 
sciences and law on the one hand, and that of physicists, on the other. 


' The Onion is a satirical newspaper, owned by an American “fake news” organization 
Onion, Inc., based in New York. 
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To put it crudely, in social sciences information comes from witnesses; but in 
what sense was Archimedes’ role in his discovery that of a witness, and are the 
experimental observations generating/supporting a physical theory on an equal 
footing with the observations of witnesses to a crime scene, or respondents to 
a poll? 

Now imagine another report, which could have been posted on the website 
of the Department of Physics of Cambridge University: 

The Cavendish Laboratory News & Features bulletin announced yesterday 
that a Cavendish student has the won The Science, Engineering and Technology 
award. He managed to measure the constant x with unprecedented precision: 
m = 31415925... with an error £2 at the last digit. 

I must confess right away that I did not read but simply fabricated this 
spoof in order to stress the further differences between the attitudes toward 
truth now held by physicists and by mathematicians respectively. 


Literally speaking, such an announcement would make perfect sense: the 
mathematical constant 7 can be measured with some precision, in the same 
way that any physical constant such as the speed of light c or the mass of the 
electron can be measured. The maximum achievable precision, at least of a 
“naive” direct measurement of 7, is determined by the degree to which we can 
approximate ideal Euclidean rigid bodies by real physical ones. The limits to 
this approximation are set by the atomic structure of matter, and in the final 
analysis, by quantum effects. 

On the other hand, in order to get in principle as many digits of 7 as one 
wishes, measurements are not required at all. Instead, one can use one of the 
many existing formulas /algorithms/software codes and do it on a sheet of paper, 
a pocket calculator, or a supercomputer. This time the limits of precision are 
determined by the physical limitations of our calculator: the size of the sheet 
of paper, memory of computer, construction of the output device, available 
time.... 

What I want to stress now is that a imagined as an infinite sequence of its 
digits is not amenable to a “finite” calculation: even the number of digits of 7 
equal to the number of atoms in the observable universe would not exhaust 7. 
As Wistawa Szymborska beautifully put it: 

heaven and earth shall pass away, 

but not pi, that won’t happen, 

it still has an okay five, 

and quite a fine eight, 

and all but final seven, 

prodding and prodding a plodding eternity 

to last. 

Nevertheless, mathematicians speak about 7 and work with 7 as if it were a 
completely well defined entity, graspable in its entirety not only by one excep- 
tional supermind, but by the minds of all trained researchers, never doubting 
that when they speak of 7, they speak about one and the same ideal object, as 
rigid as if it really existed in some Platonic world. 
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One facet of this rigidity can be expressed by a few theorems implying 
that whatever power series, integral, limit, and software code we might use 
to calculate 7 and whatever precision we choose, we will always get the same 
result. If we do not, either our formula was wrong, or the calculator made a 
mistake/there was a bug in the code/output device could not cope with the 
quantity of information .... 

Contemplating this example, we may grasp the meaning of the succinct 
description of mathematics by Davis and Hersh: “the study of mental objects 
with reproducible properties. ” 

However, I want to use this example in order to stress that most of the 
deep mathematical truths are about infinity and infinitary mental constructs 
rather than experimentally verifiable finitary—and finite—operations that can 
be modeled using actual objects of the physical world. 


2. Infinity, Georg Cantor, and truth. Before Georg Cantor, infinity 
appeared in mathematical theorems mostly implicitly, through the quantifier 
“all” (which also could be only implicit as in most of Euclid’s theorems). 

Cantor proved the first theorem ever in which infinities themselves were 
objects of consideration and of a highly nontrivial discovery. 

When Cantor first presented his diagonal argument in a letter to Dedekind 
in 1873, it was worded differently and used only to prove that the cardinality of 
the natural numbers is strictly less than that of the real numbers. The discovery 
of the proof itself was in a sense hardly less important than the discovery of the 
definition of what it means for one infinity to be larger than another one. 

As soon as this was achieved, Cantor started thinking about the cardinal- 
ity of the reals compared with that of the pairs of reals, or, geometrically, 
sets of points of a curve and of a surface respectively. They turned out to be 
equal! If we have a pair of numbers (a, 3) in (0,1), Cantor suggested to pro- 
duce from them the third number y € (0,1) by putting the decimal digits of 
q@ in the odd places and those of @ in the even places. One sees that con- 
versely, (a, 3) can be reconstructed from y. Dedekind, who was informed by 
Cantor’s letter about this discovery as well, remarked that this does not quite 
work because some rational numbers have two decimal representations, such 
as 0499999... = 05000000.... Cantor had to spend some time to amend the 
proof, but this was a minor embarrassment, in comparison with the fascinating 
novelty of the fact itself: “Ce que je vous ai communiqué tout récemment est 
pour moisi inattendue, si nouveau, que je ne pourrai pour ainsi dire pas arriver 
a une certaine tranquillité d’esprit avant que je n’aie recu, trés honoré ami, 
votre jugement sur son exactitude. Tant que vous ne m’aurez pas approuvé, 
je ne puis que dire: je le vois, mais je ne le crois pas.” 

“T see it but I do not believe it,” Cantor famously wrote to Dedekind. 

This returns us to the basic question on the nature of truth. 

We are reminded that the notion of “truth” is a reification of a certain rela- 
tionship between humans and texts/utterances/statements, the relationship that 
is called “belief,” “conviction,” or “faith,” and which itself should be analyzed, 
together with other primary notions invoked in this definition. 
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S. Blackburn in his keynote talk “Truth and Ourselves: The Elusive Last 
Word” at The Balzan Symposium on Truth, 2008, extensively discussed other 
relationships of humans to texts, such as scepticism, conservatism, relativism, 
deflationism. However, in the long run all of them are secondary in the practice 
of a researcher in mathematics. 

So I will return to truth. 

I will skip analysis of the notion of “humans” :=) and will only sketch what 
must be said about texts, sources of conviction, and methods of conviction 
peculiar to mathematics. 


(i) Texts. Alfred North Whitehead allegedly said that all of Western philos- 
ophy was but a footnote to Plato. 

The underlying metaphor of such a statement is, “Philosophy is a text,” the 
sum total of all philosophic utterances. 

Mathematics decidedly is not a text, at least not in the same sense as 
philosophy. There are no authoritative books or articles to which subsequent 
generations turn again and again for wisdom. Except for historians, nobody 
reads Euclid, Newton, Leibniz, or Hilbert in order to study geometry, calculus, 
or mathematical logic. The life span of any mathematical paper or book can 
be years; in the best (and exceptional) case, decades. Mathematical wisdom, if 
not forgotten, lives as an invariant of all its (re)presentations in a permanently 
self-renewing discourse. 


(it) Sources and methods of conviction. Mathematical truth is not revealed, 
and its acceptance is not imposed by any authority. 

Moreover, mathematical truth decidedly is not something that can be 
ascertained, as Justice Oliver Wendell Holmes put it, by “the majority vote 
of the nation that could lick all the others.” Equally laughable is his idea that 
“the best test of truth is the power of the thought to get itself accepted in the 
competition of the market.” 

If this means that truth is not a democratic value, then something is wrong 
with our conception of democracy. 

Ideally, the truth of a mathematical statement is ensured by a proof, and 
the ideal picture of a proof is a sequence of elementary arguments whose rules 
of formation are explicitly laid down before the proof even begins, and ideally 
are common for all proofs that have been devised and can be devised in the 
future. 

This ideal picture is so rigid that it can itself become the subject of math- 
ematical study, and the first two chapters of this book were dedicated to the 
presentation of the results of this soul-searching activity of our transgenera- 
tional community. 

Of course, real-life proofs are rendered in a peculiar mixture of a natural 
language, formulas, motivations, and examples. They are much more condensed 
than imaginary formal proofs. The ways of condensing them are not systematic 
in any way. We are prone to mistakes, to taking on trust others’ results that 
can be mistaken as well, and to relying upon authority and revelations from 
our teachers. (All of this should have been discussed together with the notion 
of “humans,” which I have wisely avoided.) 
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Moreover, the discovery of truth may, and usually does, involve experimen- 
tation, nowadays vast and computer-assisted, false steps, sudden insights, and 
all that which makes mathematical creativity so fascinating to its adepts. 

One metaphor of proof is a route, which might be a desert track, boring and 
unimpressive until one finally reaches the oasis of one’s destination, or a foot- 
path in green hills, exciting and energizing, opening great vistas of unexplored 
lands and seductive offshoots, leading far away even after the initial destination 
point has been reached. 


3. Mathematics and Cognition 
[...] “mismanagement and grief”: here you have that 
enormous distance between cause and effect covered in one line. 
Just as math preaches how to do it. 


J. Brodsky. On “September 1, 1989” by W. H. Auden. 


Mathematics is most visible to the general public when it posits itself as an 
applied science, and in this role the notion of mathematical truth acquires 
distinctly new features. For example, our initial discussion of 7 as an essentialy 
nonfinitary (“irrational”) real number becomes pointless; whenever 7 enters any 
practical calculation, the first few digits are all that matters. 

In a wider context than just applied science, mathematics can be fruit- 
fully conceived as a toolkit containing powerful cognitive devices. I have 
argued elsewhere that these devices can be roughly divided into three over- 
lapping domains: models, theories, and metaphors. Quoting from my book 
Mathematics as Metaphor: 


A mathematical model describes a certain range of phenomena quali- 
tatively or quantitatively but feels uneasy pretending to be something 
more. 

From Ptolemy’s epicycles (describing planetary motions, ca 150) to the 
Standard Model (describing interactions of elementary particles, ca 
1960), quantitative models cling to the observable reality by adjust- 
ing numerical values of sometimes dozens of free parameters (> 20 for 
the Standard Model). Such models can be remarkably precise. 
Qualitative models offer insights into stability/instability, attractors 
which are limiting states tending to occur independently of initial con- 
ditions, critical phenomena in complex systems which happen when the 
system crosses a boundary between two phase states, or two basins of 
different attractors. [...] 

What distinguishes a (mathematically formulated physical) theory from 
a model is primarily its higher aspirations. A modern physical theory 
generally purports that it would describe the world with absolute pre- 
cision if only it (the world) consisted of some restricted variety of stuff: 
massive point particles obeying only the law of gravity; electromagnetic 
field in a vacuum; and the like. [...] 

A recurrent driving force generating theories is a concept of a reality 
beyond and above the material world, reality which may be grasped 
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only by mathematical tools. From Plato’s solids to Galileo’s “language 
of nature” to quantum superstrings, this psychological attitude can 
be traced sometimes even if it conflicts with the explicit philosophical 
positions of the researchers. 

A (mathematical) metaphor, when it aspires to be a cognitive tool, 
postulates that some complex range of phenomena might be com- 
pared to a mathematical construction. The most recent mathematical 
metaphor I have in mind is Artificial Intelligence (AI). On the one hand, 
Al is a body of knowledge related to computers and a new, technologi- 
cally created reality, consisting of hardware, software, Internet etc. On 
the other hand, it is a potential model of functioning of biological brains 
and minds. In its entirety, it has not reached the status of a model: 
we have no systematic, coherent and extensive list of correspondences 
between chips and neurons, computer algorithms and brain algorithms. 
But we can and do use our extensive knowledge of algorithms and com- 
puters (because they were created by us) to generate educated guesses 
about structure and function of the central neural system |...]. 

A mathematical theory is an invitation to build applicable models. 
A mathematical metaphor is an invitation to ponder upon what we 
know. 


As an aside, let us note that George Lakoff’s definition of poetic metaphors 
such as “love is a journey” (in G. Lakoff. “The Contemporary Theory of 
Metaphor.” In: A. Ortony (ed.), Metaphor and Thought (2nd ed.). Cambridge 
Univ. Press, 1993) is itself expressed as a mathematical metaphor using the 
characteristic Cantor-Bourbaki mental images and vocabulary: “More techni- 
cally, the metaphor can be understood as a mapping (in the mathematical sense) 
from a source domain (in this case, journeys) to a target domain (in this case, 
love). The mapping is tightly structured. There are ontological correspondences, 
according to which entities in the domain of love (e.g. the lovers, their common 
goals, their difficulties, the love relationship, etc.) correspond systematically to 
entities in the domain of a journey (the travellers, the vehicle, destinations, 
etc.).” 

When a mathematical construction is used as a cognitive tool, the discussion 
of truth becomes loaded with new meanings: a model, a theory, or a metaphor 
must be true to a certain reality, more tangible and real than the Platonic 
“reality” of pure mathematics. In fact, philosophers of science routinely dis- 
cussed truth precisely in this context. Karl Popper’s vision of scientific theories 
in terms of falsifiability (versus verifiability) is quite appropriate in the context 
of highly mathematicised theories as well. 

What I want to stress here, however, is one aspect of contemporary math- 
ematical models that is historically very recent. Namely, models are more and 
more widely used as “black boxes” with hidden computerized input procedures, 
and oracular outputs prescribing behavior of human users. 

Mary Poovey, discussing financial markets from this viewpoint, remarks in 
her insightful essay “Can Numbers Ensure Honesty? Unrealistic Expectations 
and the US Accounting Scandal” (Notices of the AMS, vol. 50:1, Jan. 2003, 
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pp. 27-35), that what she calls “representations” (computerized bookkeeping or 
the numbers a trader enters in a computer) tend to replace the actual exchange 
of cash or commodities. “This conflation of representation and exchange has 
all kinds of material effects |...] for when representation can influence or take 
the place of exchanges, the values at stake become notional too: they can grow 
exponentially or collapse at the stroke of key.” 


In fact, actions of traders, banks, hedge funds, and the like are to a consider- 
able degree determined by the statistical models of financial markets encoded in 
the software of their computers. These models, now essentially defining financial 
markets, thus become a hidden and highly influential part of the actions, our 
computerized “collective unconscious.” As such, they cannot even be judged 
according to the usual criteria of choosing models that better reflect the 
behavior of a process being modeled. They are part of any such process. 


What becomes more essential than their empirical adequacy is, for example, 
their stabilizing or destabilizing potential. Risk management assuming mild 
variability and small risks can collapse when a disaster occurs, ruining many 
participants of the game; risk management based on models that use pessimistic 
“Lévy distributions” rather than omnipresent Gaussians paradoxically tends to 
flatten the shock waves and thus avoid major disasters (B. Mandelbrot). 


4. Truth as value. When in the twentieth century mathematicians got 
involved in heated discussions about the so-called crisis in the foundations of 
mathematics, several issues were intermingled. 


Philosophically minded logicians and _ professional philosophers were 
engaged with the nature and accessibility of mathematical truth (and relia- 
bility of our mental tools used in the process of acquiring it). 


Logicists (finitists, formalists, intuitionists) were elaborating severe norma- 
tive prescriptions trying to outlaw dangerous mental experiments with infinity, 
nonconstructivity, and reductio ad absurdum. 


For a working mathematician, when he/she is concerned at all, “founda- 
tions” is simply a general term for the historically variable set of rules and 
principles of organization of the body of mathematical knowledge, both existing 
and being created. From this viewpoint, the most influential foundational 
achievement in the twentieth century was an ambitious project of the Bourbaki 
group, building all mathematics, including logic, around set-theoretic “struc- 
tures” and making Cantor’s language of sets a common vernacular of alge- 
braists, geometers, probabilists, and all other practitioners of our trade. These 
days, this vernacular, with all its vocabulary and ingrained mental habits, is 
being slowly replaced by the languages of category theory and homotopy the- 
ory and their higher extensions. Respectively, the basic “left-brain” intuition of 
sets, composed of distinguishable elements, is giving way to a new, more “right- 
brain” basic intuition dealing with spacelike and continuous primary images, 
both deformable and deforming. 


In Western ethnomathematics, truth is best understood as a central value, 
ever to be pursued, rather than anything achieved. Practical efficiency, authority, 
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success in competition, faith, all other clashing values must recede in the mind 
of a mathematician when he or she gets down to work. 

The most interesting intracultural interactions of mathematics are as well 
those that are not direct but rather proceed with the mediation of value systems. 

Coda. Every four years, mathematicians from all over the world meet at 
the International Congress of Mathematicians (ICM), to discuss whatever in- 
teresting developments happened recently in their domains of expertise. One of 
the traditions of these congresses is a series of lectures for the general public. 

In 1998, our congress met in Berlin, and Hans Magnus Enzensberger, the 
renowned poet and essayist, deeply interested in mathematics, spoke about 
“Zugbricke auBer Betrieb: Die Mathematik im Jenseits der Kultur”: the draw- 
bridge to the castle of mathematics is out of service. The main concern of 
his talk was a deplorable lack of mathematical culture and communication 
between the general public and mathematicians, leading to alienation and 
mutual mistrust. 

At the end of his talk Enzensberger quotes an imaginary dialogue, where 
a mathematician is chatting with a fictitional layman “Seamus Android” (see 
I. Stewart. The Problems of Mathematics. Oxford Univ. Press, 1987). 

“Mathematician: It’s one of the most important discoveries of the last 
decade! 

Android: Can you explain it in words ordinary mortals can understand? 
Mathematician: Look, buster, if ordinary mortals could understand it, you 
wouldn’t need mathematicians to do the job for you, right? You can’t get a 
feeling for what’s going on without understanding the technical details. How can 
I talk about manifolds without mentioning that the theorems only work if the 
manifolds are finite-dimensional paracompact Hausdorff with empty boundary? 
Android: Lie a bit. 

Mathematician: Oh, but I couldn’t do that! 
Android: Why not? Everybody else does.” 


And here I must play God and say to both Android and Mathematician: 
“Oh, no! Don’t lie—because everybody else does.” 


Ill 
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1 The Problem: Results, Ideas 


1.1. Cantor introduced two fundamental ideas in the theory of infinite sets: he 
discovered (or invented?) the scale of cardinalities of infinite sets, and gave a 
proof that this scale is unbounded. We recall that two sets M and N are said 
to have the same cardinality (card M = card N) if there exists a one-to-one 
correspondence between them. We write card M < card N if M has the same 
cardinality as a subset of N. We say that M and N are comparable if either 
card M < card N or card N < card M. We write card M > card N if card 
M > card N but M and N do not have the same cardinality. 


1.2. Theorem (Cantor, Schroder, Bernstein, Zermelo) 


(a) Any two sets are comparable. If both card M < card N and card N < 
card M, then card M = card N. In other words, the cardinalities are 
linearly ordered. 

(b) Let P(M) be the set of all subsets of M. Then card P(M) > card M. 
In particular, there does not exist a largest cardinality. 

(c) In any class of cardinalities there is a least cardinality. In other words, the 
cardinalities are well-ordered. 


PROOF. 

(a) Suppose M has the same cardinality as the subset M’ C N and N 
has the same cardinality as the subset N; C M & M’. We identify M with 
M’'. We then have three sets Ni C M C N and a one-to-one correspondence 
f:N— N,. We must construct a one-to-one correspondence g : N — M. Here 
is an explicit definition of such a map: 


(x) f(x), ifae f"(N)\f"(M) for some n > 0, 
LZ) = 
ie x, otherwise. 


Here f"(y) = f(f(---f(y)-:+)) (m times); f"(V) = {f™ly © N}, and 
f°(y) = y. We leave the verification that g has the required properties to the 
reader. 


Yu. I. Manin, A Course in Mathematical Logic for Mathematicians, Second Edition, 105 
Graduate Texts in Mathematics 53, DOI 10.1007/978-1-4419-0615-1_ 3, 
© Yu. I. Manin 2010 
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To prove that any two sets are comparable, it is sufficient to show that any 
set can be well-ordered, since Lemma 5 of the appendix to Chapter IT implies 
that well-ordered sets are comparable to each other. Let M be any set. For every 
nonempty subset N C M choose an element c(N) € N. We call a well-ordering 
< of a subset M’ C M admissible (with respect to c) if c(M\ X) = X for all 
X € M’, where X = {Y|Y € M’, Y< X}. 

We claim that if MW’ #~ M” are two subsets of M having admissible 
well-orderings, then one set is an initial segment of the other, and the orderings 
are compatible. In fact, as in Section 7(a) of the appendix to Chapter I, we 
prove that the canonical isomorphism f of, say, M’ with an initial segment of 
M" is the identity embedding: if f(X) # X and X is the least element with 
this property, then 


f(X)=X, X =c(M\X) > X =c(M\ f(X)) = F(X), 


which is a contradiction. 

It is now easy to see that the union M’ of all subsets of M that have 
a well-ordering admissible with respect to c itself has an admissible 
ordering; moreover, M’ coincides with M, since otherwise we could embed M’ 
in M’Uf{c(M\ M’)}. 

In particular, it follows that any set has the same cardinality as some ordinal, 
and hence the same cardinality as a unique cardinal. This justifies the use of the 
term “cardinality” and the use of cardinals as our standard scale of cardinalities 
(see Section 11 of the appendix to Chapter IT). 


(b) Since P(M) contains all the one-element subsets of M, we have 
card P(M) > card M. In addition, any map f : M — P(M) cannot be 
one-to-one (or even onto). In fact, we set 


N = {2|z € f(z)} € P(M), 


and show that N is not contained in the image of f. If there existed an 
n € M such that N = f(n), we would immediately obtain a contradiction by 
considering the relationship of n to N: 


nEeN=neé f(n)>n¢€¢N__by the definition of N; 
n€N>né€f(n)>neEN_ by the definition of N. 


This is Cantor’s famous “diagonal process.” 


(c) The well-ordering of the cardinals is established at the same time as 
their comparability in the first stage of the theory of ordinals (see the 
Appendix to Chapter II). 


1.3. Remark. This proof of the lemma that any set can be well-ordered is 
essentially due to Zermelo. It was probably what prompted the most severe 
objections to the axiom of choice. The intuitive idea behind the proof reduces 
to a recipe for choosing one element after another from the set M until all of 
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M is exhausted. In this form it is immediately apparent that the prescription is 
“physically” unthinkable, and to many of Zermelo’s contemporaries the whole 
proof seemed to be nothing but a trick. For example, the idea of “first” choosing 
an element c(NV) in each subset N C M met with the following objection of 
Lebesgue. If the elements we choose are not characterized by any special prop- 
erties, how do we know that we are always thinking about the same elements 
throughout the proof? But today, except for specialists in the foundations of 
mathematics, hardly any working mathematicians share these doubts. 

We now formulate the basic problem that will concern us during the next 
two chapters. We shall write card P(M) = 2°"¢" in analogy to the finite case. 
The continuum is 2”°. 


1.4. The continuum problem. What place does the continuum occupy on the 
scale of cardinalities? 

By Theorem 1.2(b), we have 2“° > wo. Hence, in any case, 2° > w). On the 
other hand, if 2° > wy, 2° > wo,...,2%° > wy,... for any n, then we would 
have 2“° > wy,, since the continuum cannot be a union of countably many 
subsets of lower cardinality (K6nig). 


1.5. The continuum hypothesis (CH). 2°° = wy. 
The generalized continuum hypothesis asserts that 2°?°?/ comes immedi- 
ately after card M for any infinite 7. Here is what we know about this question: 


1.6. Theorem 


(a) The negation of the continuum hypothesis cannot be deduced from the other 
axioms of set theory if those axioms are consistent (Gédel ). 

(b) The continuum hypothesis cannot be deduced from the other axioms of set 
theory if those axioms are consistent (Cohen). 


The same holds true for the generalized continuum hypothesis. 

If we grant that the axioms of set theory and the logical means of expression 
and deduction in L,Set, which are implicit in the statement of Theorem 1.6, 
actually exhaust the apparatus for constructing proofs in modern mathematics, 
then we can say that the continuum problem is the first known example of 
an absolutely undecidable problem. Although Gédel’s incompleteness theorem 
provides concrete examples of undecidable propositions in any formal system 
having reasonable properties, these examples can be decided in an “obvious” 
way in some higher system. The situation with the continuum problem seems 
much more difficult. If we agree that it is a meaningful question, then it can 
be decided only by introducing a new principle of proof. Various possibilities 
for doing this have been discussed, but none of the suggested new axioms for 
set theory seem sufficiently convincing or, more important, sufficiently useful in 
“real” mathematics. In the hundred years since the introduction of transfinite 
induction, not a single new method of constructing sets has come into common 
use (see, however, the end of IV.7 (added in the second edition)). Incidentally, 
the basic idea in Gédel’s proof of Theorem 1.6(a) actually consists in verifying 
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that all the old methods allow us to construct at most w , subsets of wo 
(or, equivalently, at most w; real numbers). 


1.7. Gédel’s idea. Gédel considers the basic set-theoretic operations—forming 
pairs, products, complements, sums, and so on—and constructs the class of 
all sets that are obtained by transfinite iteration of these operations, start- 
ing from @. Such sets are called constructible sets. It is a priori completely 
unclear whether all subsets of {0,1,2,...} are constructible, or, more gener- 
ally, whether all sets in the universe V are constructible. (It turns out that 
this problem is formally undecidable to the same extent as the continuum 
problem.) But we find that within the class of constructible sets, the number 
of subsets of {0,1,2,...} is equal to w;—most likely because we have omit- 
ted a vast number of nonconstructible sets. Meanwhile, all the axioms of set 
theory, restricted to this class, are true (in a reasonable meaning of “true”), 
as are all deductions from these axioms. Hence the negation of the CH is not 
deducible, since it is false in this model. The next chapter will be devoted to 
Godel’s theorem. 


1.8. Cohen’s idea. We shall present this idea in the version due to Scott and 
Solovay. First we give its application to a certain simplified problem, concerned 
with a language weaker than LSet; then in 884-8 we present the application 
to L,Set. For another version of Cohen’s idea, see §9. 


We shall discuss the CH in the following form: there does not exist a subset 
of the real numbers R. whose cardinality is strictly between that of {0,1,2,...} 
and that of R. In fact, if we had 2° > w,, then any subset of R of cardinality 
w , would have such an intermediate cardinality. 


In order to show that this assertion is not deducible, which is equivalent to 
Cohen’s theorem, it suffices to construct a model of the real numbers in which 
all the axioms and all propositions deducible from them are fulfilled and in 
which a set of intermediate cardinality exists. This model will be the set R of 
random variables on a very big probability space Q. For a suitable choice of 
Q, R will be so big that within the model there exists a set of intermediate 
cardinality, containing N (the integers of the model) and contained in R (the 
continuum of the model). 


Of course, it cannot be quite this simple; there must be some obstacle to 
carrying out this program. The obstacle is that almost all the properties of 
R, including most of the axioms, turn out to be false for R, so that R cannot 
be a model for R in the usual sense of the word. Cohen’s basic idea was to 
develop a method for overcoming this difficulty. He replaced the property of an 
assertion being true by another property, which we shall temporarily call 
“truth” in quotes, and which has the necessary formal properties. Namely, all 
the axioms of R are “true” in R, all deductions from “true” assertions using 
the rules of logic again lead to “true” assertions, and the CH is not “true,” and 
hence is not deducible from the axioms. We now show in greater detail how this 
is done. 
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1.9. Let I be a set of cardinality > w,. We set 


Q = [0,1]', with Lebesgue measure, 
R = the set of random variables on 2 


= the set of measurable real-valued functions on 1. 


1.10. Theorem 


(a) All the axioms of the real numbers and all deductions from them are “true” 
for R. 
(b) The CH is not “true” for R. 


Here we say that an assertion P about random variables Z,y,... € R is “true” 
if the following condition is fulfilled: 


for each point w € 2 we consider the values Z(w), y(w),... of the random 
variables %,y,... and form the assertion P,, about these ordinary real 
numbers; then for almost all w € 2 (i.e., all but a set of measure 0) P,, 
is true in the usual sense of the word. 


Briefly, “truth” means experimental truth with probability one. 


EXAMPLE. Let P be the assertion that “R has no zero divisors,” i.e., “ifz,y € R 
are such that zy = 0, then either x = 0 or y = 0.” Then the assertion “R has no 
zero divisors” is, of course, not true. However, it is “true” because: if 7,y €¢ R 
are such that y = 0, then for almost all w € 2 either Z(w) = 0 or 7(w) = 0. 


1.11. In order to give a precise meaning to the definition of “truth” and learn 
how to verify effectively the “truth” of rather complicated assertions, we must 
introduce a formal language, in this case the language of real numbers. This 
formal language is a mathematical object, and the precise formulation of 
Theorem 1.10 will concern this object, and not Ror RF at all. 

The connection between this language and R is given by a system of 
informal recipes that tell how to translate the usual intuitive texts about R 
into this language, and by a system of theorems that tell us that the transla- 
tion is always possible and that the recipes are faithful to the informal texts. 
The role of R is reduced to that of auxiliary construction that is used to define 
and compute a special “truth” function on the formulas of the language. Thus 
we see the role of logic in the program. 


1.12. A detailed proof of Theorem 1.10 would be rather lengthy and nontrivial 
for several reasons. In the first place, a certain amount of space must be devoted 
to describing the formal language and the axioms of R in this language. We must 
then verify that all the axioms are “true” and that the CH is not “true”—this 
amounts to one or two dozen verifications, each of which involves an inductive 
argument with infinite sums and products in the Boolean algebra of measurable 
sets in Q. However, the most serious difficulties arise because the meaning of 
every assertion changes considerably in going from R to R, and not always in 


110 III The Continuum Problem and Forcing 


a convenient direction. We shall illustrate this qualitative aspect by attempting 
to explain why the CH is not “true,” and why this is nontrivial. 

As we have said, we want to construct a subset M of R having cardinality 
intermediate between the cardinality of N and the cardinality of R. We do this 
as follows: For any i € J, let the random variable z; : [0,1] — [0,J/] be the 
ith projection. Choose a subset 7 C I such that wo < card J <card I (this is 
possible if I is large), and set 


M = {2;\7€ J} CR. 


Then card N < card M < card R is true in the usual meaning of the 
word. However, we must show that the corresponding assertion is “true” in our 
Pickwickian sense. But then the role of the integers is assumed by the “locally 
integral” random variables (whose values are integral with probability one), and 
these random variables can have cardinality much greater than wo. Thus, the 
required lower estimate for card M becomes much more serious. Similarly, if we 
formalize our naive description of M and then interpret it in R, then M takes 
on a new meaning, and leads to a much larger set than the “real” M. Thus, it 
is also unclear that the upper inequality for card M still holds. It seems almost 
miraculous that everything eventually falls into place. 

The plan for the rest of the chapter is as follows. In §2 and §3 we give a 
(shortened) exposition for the second-order language of real numbers of this 
abbreviated version of the theorem that the CH is not deducible. If the reader 
is interested only in the complete proof for L1Set, he may skip to §4, where we 
introduce the Boolean-valued “universe of random sets,” which takes the place 
of V. In §85—7 we verify that the Zermelo—Fraenkel axioms are “true,” and in 
88 we verify that the CH is “false.” Finally, in §9 we discuss Cohen’s original 
method, which is more syntactic and involves somewhat different intuitive ideas. 


2 A Language of Real Analysis 


2.1. In this section we describe a formal language based on the theory of real 
numbers. In particular, this means that the variables x, y, z will be considered 
as names of real numbers. However, if we try to use a first-order language to 
formulate the assertions we are interested in, such as the continuum hypothesis 
CH, or even the completeness axiom (which differentiates the real numbers from 
the rational numbers), we find that we are not able to do this. In fact, in these 
assertions we need to refer to arbitrary subsets (or relations of degree one) of 
the real numbers, whereas first-order languages do not have symbols for variable 
relations (compare with Section 3.17 of Chapter I). 

This leads us to consider the second-order language L2Real, which is the 
most economical language in which the axioms and the CH can be expressed. 
We shall give a brief description of this language, for the most part noting only 
those features that show the connections with the real numbers and those that 
are peculiar to second-order languages. 
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2.2. The language LaReal. The alphabet consists of the variable symbols 
X,Y, Z,-..; the symbols for degree-1 functions f,g,h,...; the constants 0 and 1; 
the degree-2 operations + and - ; the degree-2 relations = and < ; and the same 
connectives, quantifiers, and parentheses as in languages of £;. The terms are 
x,y,z,-..and 0 and 1; and also f(t), t,-tg, and t; +t if f is a function symbol 
and t,t ,, and t2 are terms. The terms are names of real numbers. 

The atomic formulas are ty = tg and t, < ta, where t; and t2 are terms. The 
set of formulas is defined inductively exactly as in languages of £1, with one 
addition: Vf(Q) and 4f(Q) are formulas if Q is a formula and f is the symbol 
for a variable function. The notions of a free occurrence of a variable (a or f), 
of a closed formula, and so on carry over to L2Real in the obvious way. We shall 
use the same type of abbreviated notation here as in Chapter I. The standard 
interpretation of formulas that is implicit in the language should be obvious 
from the definitions and from the following examples. 


2.3. The formula Z(y): “y is an integer.” It is perhaps not completely obvious 
how to write this formula. We can write, “y can be obtained from 0 by repeatedly 
adding or subtracting 1,” or else “any function f that has period 1 and vanishes 
at 0 must also vanish at y,” ie., 


Zy): Vf ((f0) =0AVa(F(e) = f(@+1))) > f(y) =0). 


2.4. The formula CH: “Any subset of R either has the same cardinality as R, 
or else is countable or finite.” 

We first restate the formula in different words: “Given a set of zeros of any 
function h, either there exists a function g mapping it onto all R, or else there 
exists a function f mapping the integers onto all of this set.” We then have 


CH: Vh(Ag Vy Sa(h(x) = 0A y = g(x)) V Af Vy(h(y) 
=0> 4da(Z(x) Ay 


Notice that the formula Z(x) occurs as part of the CH. 
We further write the completeness axiom C: 


2.5. The formula C: “Any subset of R (the set of values of a function f) that is 
bounded from above has a least upper bound z.” We write 


C: Vf(Ayva(f(x) < y) > dz Vy (Va(f(a) <y)@z<y)). 


All the other formulas we are interested in are simpler and do not require any 
special comment. 

We now give a precise definition of the property of “truth” for closed 
formulas in L2Real; this property was described informally in $1. We empha- 
size that it is not an absolute property, but rather depends on the choice 
of the probability space that is used to construct the “model” of the 
real numbers. 
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2.6. The algebra of truth values. As in §1, we set 


I= aset; 

Q = [0,1]! with Lebesgue measure; 

B = the algebra of measurable sets in 2. modulo sets of measure zero; 
0 = the class of the empty set in B; 

1 = the class of 2 in B. 


We have the following operations in B: 


a’, the “complement” of the element a € B; 


a/b, the “intersection” of two elements a,b € B; 
a\Vb, the “union” of two elements a,b € B. 


These operations satisfy the usual identities and give a Boolean algebra 
structure on B. We writea < bifaAb=a. 

Moreover, the operations of intersection and union extend uniquely to 
infinite families of elements, and continue to satisfy the usual identities that 
hold in the algebra of all subsets of any given set. We shall omit the verification 
of all this. We note only that sets here are identified “modulo sets of measure 
zero,” and that identities of the type (A mod 0) A (B mod 0) = (AN B) mod 
0 do not carry over to infinite families. 

Finally, B satisfies the following countable chain condition: if ag \ ag = 0 
for all distinct indices a and § then aq # O for at most countably many 
indices a. This follows because Lebesgue measure is positive and additive. Tech- 
nically speaking, B is a complete Boolean algebra with the countable chain con- 
dition. The precise origin of B and the fact that it has a measure play a less 
important role. 


2.7. The interpretation set. We now introduce a large set M, each point € of 
which corresponds to the assignment of certain values to all the symbols in the 
alphabet of LoReal. If € is fixed, each formula becomes a concrete statement 
about measurable functions (random variables) on Q and about functionals on 
them (compare with §2 of Chapter IT). 

More precisely, we set 


R = the set of measurable real-valued functions on Q; 
) — the set of all possible maps f : R= R that satisfy the condition 


y 
(the set {w € Q|z(w) = y(w)} < {w € O|F(Z)(w) = f(y)(w)} mod 0). 


The definition of Rp” has the following intuitive meaning. If we ignore the 
“mod 0,” the condition simply means that the value of the random variable 
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f(Z) at each trial (each point in ©) must be determined by the value of % at 
this trial. Of course, this is a very natural requirement if we want functions f 
to be adequate reflections of properties of ordinary real-valued functions in the 
sense of $1. The addition of “mod 0” weakens this requirement by saying “with 
conditional probability one.” 

We now return to the set MW. A point € € M consists of a choice of 


a® € R, for each variable symbol z; 


fre Ro. for each symbol f for a variable function. 


Here is the interpretation of the expressions in the language that corresponds 
to a given choice of €: 


(a) Terms. Let t be a term, and let € € M. Then ¢§ € R is the random 
variable that is defined inductively in the obvious way. 

(b) The truth function || || on atomic formulas. Let P be the atomic formula 
ty < te or ty, = to. Its truth value at a point € € M is the element of the algebra 
B that is defined as follows: 


It < tall © = {w € QUe§(w) < ew) } mod 0, 


and similarly for t; = tg. 

(c) The truth function ||P||(€) in the general case. The general definition 
proceeds by induction. The rules when formulas are joined by connectives are 
the same as in Section 5.7 of Chapter II: 


Pll = ||P, 
IP V QI] = IlPILV lel, 
IPA QI] = ||PIA lel, 
|P => Ql = IPI’ v IQ i, 
|P + Ql] = (PIA HQI) v (PIA Tet). 


Here, for brevity, we have omitted the €. Finally, 


(over all é’ that differ from 
€ only by a variation of 2); 


IIW2Pll(é) = A PIE) 
f 


IS2PIl(é) = \V IPI) (over the same €’); 
M 


and similarly when we quantify over variable functions. Intuitively, the value of 
the truth function of an assertion about random variables is the set of trials 
mod 0 for which this assertion becomes true as a fact about real numbers. 


2.8. Lemma. If P is a closed formula, then ||P||(§) does not depend on the 
choice of € € M and takes only the value 0 or 1. 


This is proved by a simple induction on the length of P. It is just as easy 
to prove a more general fact: if P is any formula and € and &’ do not differ 
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on variables that occur freely in P, then ||P||(€) = ||P||(é’). Compare with 
Proposition 2.10 in Chapter II. 

This value of ||P||(€) that is common for all € if P is closed can be denoted 
simply by ||P]. We are now ready to formulate the basic definition of this 
section: 


2.9. Definition. A formula P in L2Real is said to be “true” if || P||(§) = 1 for 
allée M. 


3 The Continuum Hypothesis Is Not Deducible in Lz Real 


3.1. Fundamental Lemma 


(a) “Truth” is preserved under the rules of deduction. 
(b) The first-order logical axioms and the versions of them in L2Real are “true.” 
c) The special axioms of L2Real are “true.” 
(d) The CH is not “true” if card I > wy. 
This lemma implies the following theorem 


3.2. Theorem. The CH is not deducible from the axioms in L2Real. 

In this section we give those parts of the proof of the fundamental lemma 
that are also essential for the “real” Cohen theorem, as well as for our 
simplified problem. We note that Theorem 3.2 is weaker than Cohen’s 
theorem because the language L2Real contains fewer means of expression than 
the language of set theory. Although the continuum hypothesis can be stated 
in L2Real, because of Gédel’s general results we have no basis for expecting, 
even if the CH were deducible, that the proof could also be given in this lan- 
guage. For example, the deduction could require us to introduce functionals of 
functions, functionals of functionals, and so on. The language of set theory, To 
which we shall return in §4, contains the means for considering all of these finite 
and even transfinite levels at once. 


3.3. PROOF OF 3.1(a). If ||P|| = 1 and ||P > Q|| = 1, then ||P||’ = 0 and 
|Pl|’ V ||Q|| = 1, so that ||Q|| = 1. Secondly, if || P|] = 1, then ||P]|(€) = 1 for all 
€ € M; but then (here €’ runs through all variations of € along 2) 


IW2P(€) = Alpi) = Al=1 
é é! 


We similarly prove this for Gen over functions. 


3.4. PROOF OF 3.1(b) (SKETCH). 

Tautologies. Their “truth” is proved in §5 of Chapter II. 

Quantifier axioms. The proof proceeds by induction on the length of the 
formulas in the axiom schemes. Since it is completely straightforward, we shall 
omit it. 
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3.5. PROOF OF 3.1(c) (SKETCH). We shall list the axioms and make some brief 
comments. 

The special axioms of set theory: The axioms of equality and the axiom 
(schema) of choice 


AC: Va AyP(a2,y) > Af V«eP(a, f(x)), 


where P is any formula which does not have any free variables except x and y, 
and where f is free for y in P. 

The special axioms of field theory: The axioms of the additive group, the 
axioms of the multiplicative group, and the distributivity of addition with 
respect to multiplication. 

The special order axioms: 


reyVy<oe, 

(a@<yAy<ar)er=y, 
ucys(at+z2cey4+2), 
ur<yA0<z)SazK yz 


The completeness axiom (see 2.5). 

Among these axioms, the greatest effort is needed to verify that the axiom of 
choice and the completeness axiom are “true.” But these computations resemble 
those in the proof that the CH is false, which will be given in detail below. Hence, 
the verification of these two axioms will be omitted. 

The first axiom of equality is trivial. The second axiom is first verified for 
atomic formulas P, and then we use induction on the length of P. The argument 
is rather tedious, but simple. 

The axioms of an ordered field are verified without difficulty. We shall limit 
ourselves to one example: “every nonzero number has an inverse,” i.e. 


wey=D)l= /\ (ie=o VV lleg= ui) 


ZER gER 


||\Va(A(a = 0) > 


To verify that this truth value equals 1, it suffices to prove this for each term 
on the right, i.e., for each fixed  € R. Then, in turn, for that Z it suffices 
to construct a random variable y € R such that ||Z = 0||\V||zy = 1]| = 1. 
We set 


7 z(w)—1, if Z(w) £0, 
0, if Z(w) £0. 
3.6. PROOF OF 3.1(d). We first recall the formula for the CH: 
Vh(dg Vy Ja(h(z)= 0A y = g(2))V 
Af Vy(hy)=0=> de(Z(e) Ay = f(c))). 


We let P; and P, denote the first and the second alternatives in this formula. 
Thus, the CH has the form Vh(P; V Pz). We must prove that ||VA(PiV P2)||(§) = 0 
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for any point € € M. By the definition in 2.7, 


lIlvh (Pr v Pa) = \ PLM) v Pall(€)), 
e 


where €’ runs through all variations of € along h. To show that this value is 0, 
it suffices to find a point €’ such that || Pi||(€’) = || Pal|(é’) = 0. Since all the 
variables except h are bound in P; and P2, choosing &’ is equivalent to choosing 
nS =he Re. We shall give h explicitly; this will be a function “whose set of 
zeros has intermediate cardinality.” 

To do this, as in §1 we fix a subset 7 C I having cardinality strictly between 
wo and card I. Recall that for each i € I, Z € R is the “ith coordinate” 
function. Further, for each random variable z € R, we choose a subset Q(Z) C Q 
such that 


= 


E=#;\|=OQ(Z) mod 0 


(here we use the completeness of B). Finally, we define h € R as follows for 
every Z€ Randwe?: 


0, if w € Q(%), 
Ty otherwise. 


3.7. Correctness Lemma 


(a) For fired %, h(%) is measurable as a function of w, so that h maps R to R. 
(b) For every Z € R we have 


|A@) =o] = V lle =ayll- 
JET 


(1) ( 


(c) hE R’’ (see 2.7), so that there exists a point €' € M for which h& = h. 


PROOF. 


(a) h(Z) takes only the values 0 and 1 on 2, and the set where it takes each 
of these two values is measurable by the definition and by the completeness 
of B. 

(b) is obvious from the definition. 

(c) We must verify that for all z,y € R we have 


{w € Q|z(w) = y(w)} < {w € Q|A(Z)(w) = h(g(w))} mod 0. 
We shall show that the set of points w € © for which both z(w) = y(w) and 
h(Z)(w) # h(y)(w) has measure zero. - 
It suffices to consider the case h(Z)(w) = 0, h(Y)(w) = 1, i-e., to show that 


|= = Gl A |A(@) = Ol] A A) = 1] = 0. 
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We write the second term in the form \/ j<7 ||Z = Z;|| (by 3.7(b)) and apply the 
distributive axiom to the first and second terms (where we use the completeness 
of B). We further use the fact that ||% = y|| A ||Z = Z,|| < ||y = Z;||. We then 


obtain - 
|Z = gl A ||Ae < V lla =2,ll = lh(g) = Ol), 
JET 


which immediately gives us the required result. 


Explanation. Since the choice of h is the essential step in the proof, we would 
like to give some motivation for this choice. Recall that h is the name of the 
function the cardinality of whose set of zeros interests us. We choose a concrete 
h to “disprove” the CH in such a way that the “almost everywhere zeros” of h 
include the elements of the set {x;|j €¢ 7}, which has intermediate cardinality 
in the naive sense of the word (compare with §1). However, h cannot be an 
arbitrary map from R to R; it must satisfy the strong condition h € RE”, 
Hence, along with all the 7;, the almost everywhere zeros of h might also have 
to include various other y € R, and might have to “partly include” still other 
z € R. We say “partly include” to convey the possibility that ||h(Z) = 0|| is 
neither 0 nor 1, so that Z has a “certain probability” of being a zero of h. 

Thus, the “set of zeros” of h might be bigger than we want, and we might 
expect to encounter difficulties in proving that this set cannot be mapped onto 
all of R (the alternative P,). On the other hand, it would seem that this situation 
would make it trivial to disprove the alternative P, (mapping Z onto the entire 
set of zeros). But even this is wrong! As we noted before, we can have || Z(Z)|| = 1 
for many % that are not constant integer functions on 2. Moreover, for still other 
& we have ||Z(Z)|| 4 0,1, so that the “set of integers” in our model has grown 
considerably. 

A final remark: In this discussion we have been essentially dealing with 
the concept of a “B-random set,” which will be a central idea in what follows 
(see §4). That is, the “set of zeros of h” is random in the sense that for each 
2 € R, the assertion “Z € (zeros of h)” is naturally assigned the Boolean truth 
value ||h(Z) = 0]. 

We now return to the proof that ||CH|| = 0. 


3.8. PROOF THAT ||P,||(€’) = 0. By the rules for computing truth functions, 
we obtain 
Pull(€ )=VAV {Ih = Oil Alla = a(a)Il}. 


where h was defined above, g runs through all elements of Rr”, and % and y run 
through all elements of R. We suppose that ||P;||(€’) 4 0, and show that this 
leads to a contradiction. We write the above formula for ||P,||(€") as Vj a(9). 

If ||Pil|(€’) 4 0, then a(g) 4 0 for some concrete function g € Fa We take 
this function g and set 


«= AV (V le = al aly =a00)). 


ZG jET 


118 III The Continuum Problem and Forcing 


Here we have substituted Vj<¢7 ||% 
Furthermore, we obtain || = Z,|| A | 
and distributivity, we obtain 


= &,|| for ||A(z) = O|| using 3.7(b). 
= g(2)\| < |l¥ = 9(%;)||- Using this 


J 


In particular, for each Z; in place of y, we have 
a< V li =9(@) Ih 
JET 
If, as we have supposed, a # 0, then for each 7 there exists a j(¢) € J such that 
Zi = 9 (Zs) || #0. 
Since I is uncountable and card J < card I, it follows that there exists a 
jo € J such that jo = j(¢) for all i in an uncountable subset Jo C I. But this 


contradicts the countable chain condition on B, because the terms in the family 
|Z; = 9(%,,)||(@ € Ip) are pairwise disjoint. In fact, 


Ili, = 9 (Fj) IIA [lin = G (Fi) || < []4Ba, = Fin || = 0 


if i; A Ga. 

Notice to what extent this proof parallels the “naive” argument in §1. By 
assumption, the function 7 maps the zeros of h onto R “with nonzero proba- 
bility.” But the exact meaning of the computations cannot readily be stated in 
words. 

Computation of ||Z(y)||. The formula for Z(y), “y is an integer,” was given 
in 2.3. Since this formula occurs in P2, we must compute ||Z(y)|| in order to 
compute || P|]. 


3.9. Lemma. Let 7 € M andy" =y € R. Then 
IZ@II(n) = VV lly = nll = {w € Qlgw) € Z} mod 0. 
nEZ 
ProoF. We must show that 
/\ (is = oy" v (\ViiF@) = Fla-+ 10") Vw = )=04) = VV la=all. 
f x neEZ 
We prove this equality by proving inequality in both directions. 


The inequality <. It suffices to find a concrete function f € RB” for which 
the corresponding term on the left is contained in the right-hand side. We 
define f by setting f(%)(w) = sin? 7Z(w) (here, instead of sin? rz, we could 
take any measurable function with period 1 and zeros only at the integers). 
It is easy to see that f(z) € R and f € R”. Then | f(0) = Ol’ = 0 and 
\| f(z) = f(@ + 1)||' = 0. Hence we need only verify that 


ll sin? ry = 0 < VV lg =nl, 
nEZ 


and this is obvious. 
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The inequality >. It suffices to show that for any fixed values of n € Z, 
fe RB” and 7 € R, we have 
ly =n\|<bve, 


where 


b= || f(0) = Ol’ v (Visie=s err); c= ||F@) = Ol. 


But the inclusion a < bV c is equivalent to a Ac’ < b. Furthermore, in our 
situation we have 


ane = |g=nl A||f@) =O!’ < I f(n) = Ol. 
(Here n in f(n) is the constant random variable that is everywhere equal 


to n.) 
It is thus sufficient to see that 


IIF(r) = ll’ < If) = Ol!’ v (Vive IF (& =fe+0n'), 


or, taking complements, that 


Ilf(n) =O] > | F(0 y=01 (Ae) =F el). 


The right side can become larger only if we only take the intersection over the 
terms with 7 = 0,1,2,...,2—1. But this obviously gives 


Il F(0) = OA] FO) = FL) =--- = F(rJIl < IF(2) =O]. 


3.10. PROOF THAT || P2||(€’) = 0. Using Lemma 3.9 and the rules for computing 
truth functions, we find that 


Wie) = VA (i1ca) = oF vY (VV be = alls ta = FI) ). 


Since f ¢ R we have ||¢ = nll < || F(Z) = F(n)]|, so that ||z = nll A\ly = F(2)|| 
< lly = f(n)|)- 

Now it suffices to prove that the term corresponding to any concrete choice of 
f is equal to 0. We suppose that this is not the case, and show that we obtain 
a contradiction. Let a # 0 be the term corresponding to f. By the previous 
paragraph, we have 


a< A (ine )=oll'v Vly = Font). 


y 


In particular, for every 7 € J we must have (with Z; in place of ¥) 


a<\/ la; =f 
nm 


120 III The Continuum Problem and Forcing 


(where we have ||h(%;) = 0||/ = 0 by 3.7(b)). Hence, for every j there exists an 
integer n(j) such that 0 # ||z; = f(n(j))||. Since 7 is uncountable, there exist 
an mo and an uncountable subset J C J such that n(jo) = no for all jo € Jo. 
Then the ||z; = f(no)|| for 7 € Jo form an uncountable set of pairwise disjoint 
nonzero elements of B. This contradicts the countable chain condition on B. 


4 Boolean-Valued Universes 


4.1. In this section we fix a complete Boolean algebra B (see 2.6) and construct 
the universe V? of “B-random sets.” It will be a model for the Zermelo—Fraenkel 
axioms in the same generalized sense in which the random variables R were a 
model for the real numbers R in §3. In §85—7 we verify that all the axioms of 
L,Set are “true,” and then in §8 we verify that the continuum hypothesis is 
“false” for a suitable choice of B. 

The objects of V? will be denoted by capital letters X,Y,Z,.... Any two 
objects determine elements ||X € Y|| € B and ||X = Y|| € B. The intuitive 
meaning, say, of the first of these is as follows: if B is the algebra of measurable 
sets in a probability space, then ||X € Y|| is the maximal set on which “X is an 
element of Y with probability one.” Since we do not deal with probability mea- 
sures in the general case, we shall simply call the elements of B “probabilities,” 
and then ||X € Y|| is simply the probability that X belongs to Y. 

It is not trivial to construct precise definitions, because we want the axiom 
of extensionality to be “true.” If a random set must be uniquely determined 
by its elements (which are also random), even in a generalized sense, then this 
random set cannot be “too” random (see 4.3). 

We shall assume that as a set B is an element of the von Neumann 
universe V. Then all the objects of V? will also be elements of V, and all our 
constructions can be expressed in L;Set. In principle, this allows us to take a 
more formalistic point of view than we shall in fact take. The proof given below 
of the independence of the CH could then be used as a guide for constructing 
a much more syntactic version, based on an “internal interpretation” of the 
language LSet in itself. In this context the assumption that the Zermelo— 
Fraenkel axioms are consistent in the statement of Theorem 1.6 becomes a 
necessary precaution, since (by Gédel’s result) this consistency cannot be 
established using only the language LSet itself. However, in our treatment 
this condition is pure hypocrisy, since by assuming the “existence” of the uni- 
verse V, which is a model for the axioms, we automatically “prove” that those 
axioms are consistent (see Section 18 of the appendix to Chapter IT). 


4.2. Construction of V®. For every ordinal a we construct the set V,? by trans- 
finite recursion, and then set V? = U,V.?. The first step is ee =. 

Inductive assumption. The set V,? is defined for the ordinal a > 0; for every 
element X € V2 the set D(X) Cc VP is defined (its intuitive meaning will be 
explained below); for every pair of elements X,Y € V? the “Boolean truth 
functions” 


|XeYlleB, |X=YleB 
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are defined (intuitively, they should be thought of as the “probability that 
X is an element of Y” and the “probability that X coincides with Y,” 
respectively). 

By assumption, this data satisfies the following conditions: 


(a) If 6 < Bo < ay, then VJ? < V3. 
(b) If @ <a and X € Vj, \Vg’, then D(X) = Vj’. (ys 
(a) |X eY] = (|X = Z| A |Z € YI) 


(the condition (1). expresses the requirement that the formula x € y © 


dz(@ = z Az € y), which is easily deduced from the Zermelo—Fraenkel axioms, 
must be “true”). 


ZED(Y) 


(c2) |X =Y]| = ( A lZeXi'vilZe vi) 


ZED(X) 


a A Izerr'vize xt] (2)a 


ZED(Y) 


(this condition expresses the “truth” of the formula x = y © (Vz (2 € « > 
z€y)AVz (2 €y=>z€2)). We note that it is not completely clear at this 
point why, for example, in (1), we took the union only over Z in D(Y); it would 
seem natural to take all Z. Later we shall see that the formula remains true if 
we take the Boolean union over all Z. 

This completes the description of the data for V?. We now give explicitly 
the recursive construction of V2. and the corresponding data. 
Definition of V5,, and D. We set V32., = V2 U Ve a where VA consists 
of all possible functions Z with domain of definition V2 and range of 
values C B that satisfy the following “extensionality condition”: 


|X =Y||AZ(X)=||X=Y||AZ(Y), for all X,Y e V2. (3) 


A little later we shall define ||X € Z|| = Z(X) for X € V2 and Ze V3_\V2. 
Thus, as before, (3) can be thought of as reflecting the formula 


(t@=yArEz)S(a@=yAYE2). 


Compare also with the comment in 2.7 concerning the definition of RY. 

We shall call the elements of V.2.,\V2 new elements (of rank a + 1), and 
we shall call the elements of V.? old elements. We set D(Z) = VP if Z is a new 
element. 


Definition of the Boolean truth functions. These functions have already 
been defined for pairs of old elements. We further set 


|X € Y|| = Y(X), if X is old and Y is new; (4) 
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|X =Y|| = ( A Izextvizeri] 


ZED(X) 


a A Izerr'vize xi), (5) 


ZED(Y) 


Because of (2), (5) automatically holds if X and Y are both old elements; in 
the other cases, (5) uniquely determines ||X = Y'|| if we use (4) and the fact 
that Z runs only through old elements in (5). Finally, we set 


IxeVY]= Yo |x=Z]allZey|| (6) 
ZED(Y) 


if X is a new element and Y is either new or old. The right side is uniquely 
determined using (4) and (5), since D(Y) c V2. 


Formulas (4) and (6) show the following. As a first approximation we might 
say that a random set Y of rank a “consists” of sets Z of lower rank that occur 
in Y with probability Y(Z); these probabilities can be chosen rather arbitrarily, 
subject only to the extensionality condition (3). 

However, we then find (in formula (6) for new X and old Y) that we must 
automatically “include” more and more elements X in Y with probabilities 
already assigned by formula (6). It is conditions (3) and (6) that prevent our 
sets from being completely random. 


Definition of V? and other data for limiting ordinals a. We simply set 
Vo = Us<aV¥ , and then all the other data has already been determined. 


4.3. Verification that the definitions are correct. Properties 4.2 (a) and (b) are 
obviously preserved in going from a to a+1; we must verify (1)a41 and (2)a41. 
Now the only identity here that is not completely obvious is obtained by taking 
X old and Y new in (1)q41: 


Y(X)= VV |X =Z|AY(Z). 
Zevé 


This is verified as follows. We obtain > by writing the right-hand side in the 
form V7 |\|X = Z|| A Y(X) using (3). We obtain < by considering the term 
with Z = X and taking into account that ||X = X|| = 1 for all X (as follows 
immediately from (5)). 

This completes the construction of the Boolean-valued universe. 


4.4. EXAMPLES AND REMARKS. We examine some special cases of these 
constructions in order to clarify their structure. 

(a) Obviously V;27 = {@}, since there exists a unique “empty” function 
whose domain of definition is the subset Vj? = @. We compute Vj? = V,PBUVP. 
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We let {2}, € VP” denote the function of the one-element set V,? that takes 
the value b € B. All these functions are extensional, so that 


VP = {@,{@}p, for all b € BY. 

It follows from (4) that 

|S € {S}ol| = 6. 
It is clear from (5) that 

|S = {S}ol| =v. 
Intuitively, these formulas mean that {@}, consists of one element @ “over b” 
and is empty away from b. Again applying (5), we obtain 

{2}. = {},]| = (a VB) A(aVO') =(aAb)V(a' AD). 


Thus, {@}, and {@}, coincide when either they are both empty or they both 
consist of one element @: this agrees with intuition. Now applying (6), we obtain 


{Sha € {Soll = {Sha =F || A|]0 € {S}o]] =a' nd 


(i.e., the only possible inclusion, which has the form @ € {@}, holds when {@}q 
is empty and {@}, is nonempty). 

Finally, let X ¢ V,2" be an extensional function on the subset V,? with 
values in B. Then, by (6), 


|X € {Soll = |X = SI A|19 € {Shall = |X = 9] Ab, 


and by (5) 


y] 


|X = || = (A {Pha € xi) A\|S EX!’ 


ae€B 


= (Vv KKelne xvii ext 


ae€B 


Thus, intuitively, || X = @|| means the complement of the support of X in B, 
and ||X € {@},|| is the set where both X is empty and {@}, is nonempty, which 
again agrees with the usual formula @ € {@}. This shows how new objects X 
can be random elements of old objects with nonzero probabilities. 

(b) We consider the case B = {0,1}. The corresponding probability space 
consists of one point, so our random sets become completely determined. What 
happens is this: the universe V? maps naturally onto the von Neumann uni- 
verse V in such a way that if X denotes the image of X € V®, then all X and 
Y satisfy the conditions 


|X eY=1eX ey, 
|X =Y||=1eX=y¥. 
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To construct this map we first set @ = @. We now suppose that the map 
vious — VY, has already been constructed with the required properties, and 
we extend the map to a+ 1. To do this, for any new element X € vi + We 


first find the subset of yo on which X takes the value 1, and we then take 
the image of this subset in V,, which is an element X of P(Va) = Va+1; by 
definition, our map takes X to this X. We leave the verification of the properties 
of this map to the reader. 


(c) Boolean truth functions for the formulas in LSet. 

We define these truth functions in an analogous manner to §2. We introduce 
the interpretation class 7: each point € € M assigns to every variable symbol 
x in LSet some object 2§ = X of the universe V?. We further assume that 
every point € maps the symbol @ in L;Set to the empty set. 

If P is the atomic formula x € y or x = y in L;Set, then ||P||(€) is defined 
to be ||a§ € y§|| € B or ||z& = y§|| € B, respectively. The value of || P]|(€) 
for all other P is defined inductively using exactly the same formulas as in 
Section 2.7. We need only note that although the expressions \/, ag and /\, a¢ 
must be taken over families indexed by the class M when we compute with 
quantifiers, all the different elements of such a family form a subset of B, so 
that such an expression makes sense. We shall call a formula P “true” (in the 
model V”) if ||P||(€) =1 for all €, and we shall call P “false” if || P||(€) = 0 for 
all 

As in §3 of Chapter I, it can be verified that all the tautologies and logical 
quantifier axioms are “true” and that the rules of deduction preserve “truth.” 
Hence, it remains for us to show that the Zermelo—Fraenkel axioms are “true” 
(for any B) and that the continuum hypothesis is “false” (for suitable B). 


5 The Axiom of Extensionality Is “True” 
We begin by proving some relations between the truth functions. First of all, it 


is clear from formula (5) in §4 that ||X = Y|| = ||Y = X|| and ||X = X|| =1. 
The following lemma is a less immediate consequence of the formulas. 


5.1. Lemma. For any X,Y,Z © V® we have 


|X =YIAll¥ = Z| < |X = 2], (I) 
|X =YIAl¥ € Z| < |X € ZI, (II) 
|X EYIAI¥ = 2] < |X = 4]. (II) 


PROOF. 
(a) (III) holds if X € D(Y). In fact, then by formula (5) in §4, 


|¥ = Z| <||XeY|'v |X € 2], 


so that if we intersect both sides with ||X € Y'||, we obtain (III). 
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(b) (III) holds if X,Y € VB and Z is a new element of V,2.,. In fact, we 
choose U € D(Y) and apply the special case of (III) proved in (a): 


IVE VI AIY = 2] <|U € ZI. 


We take the Boolean intersection of both sides with ||X = U|| and then the 
Boolean sum over all U € D(Y). Now applying formula (6) in §4 to the left- 
hand side and using distributivity, we obtain 


|IXeYIAlY=Zl< Vo |x=ullalue Z| 
UED(Y) 


i Vv |X =U] AU € Z| = |X € 2]. 
UED(Z) = 


(c) (1) holds in V2, if (IIL) holds in VP. We consider an element 
U € D(X) € V2. By (a), we have 
UE XAIX=YVI < Ue YI. 
We take the Boolean intersection with ||Y = Z||: 
JU EX AIX=YI AY = 4] <|]!0 € YAY = 2]. 


Here the right side is always < ||U € Z||. In fact, if Y € V2 this follows by part 
(b) or by the induction assumption, and if Y is a new element of V2, then it 
follows by part (a). 

We have thus shown that for all X,Y, Z € V2, and all U € D(X), 


|U eX AIX =YIATY = 2] < [10 € 4 

Because a A b < c implies b < |la’ V cl] in any Boolean algebra, we then obtain 
|X=Y|Al¥ = Z| <||UE Xv lu € ZI, 

and hence 


IX=YIAIY=Z)< A Ue Xvi eZ]. 
UED(X) 


Interchanging X and Z, we find that for all U € D(Z), 
IZ=YIAl¥=XI< A. lue4i'vive xX. 
UED(Z) 


These last two formulas, together with (5), clearly imply (I). 


(d) (IL) holds in V2, if (I) holds in V2.,. In fact, let U € D(Z). By (I), 
we have 
|X =Y|All¥ =U] < |x =U]. 
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We take the Boolean intersection with ||U € Z]|| and then the Boolean sum over 
all U € D(Z): 


ix =riia( V Iw eziaiy =u) < Vo WZ =U AU € ZI. 
UED(Z) UED(Z) 
Applying (1)a41 in §4, we obtain (II). 


(e) (II) holds in V.2., if (II) holds in V2.,. In fact, let U € D(Y). By part 
(a), we have 
UE VI AIY = 4 <|U € ZI]. 


Intersecting with ||X = U]| and applying (II) to the right-hand side, we obtain 
|X =U A||U € YAY = Z| < |X € ZI. 


Finally, if we take the Boolean sum over all U € D(Y) and use formula (1) in 
$4, we obtain (III). 


Obviously, parts (a)—(e) prove the inductive step for a to a+ 1. We are now 
in a position to establish the basic result of this section. 


5.2. Proposition. The aziom of extensionality 
r=ye V2e(zEurezey) 


is “true.” 


Proor. The formula ||P = Q||(€) = 1 is equivalent to ||P||(€) = ||Q||(€). It is 
therefore sufficient to prove that for all X,Y € V®, 


IX=VI= A (Ze XI vIIZEVI)A(|Ze XI VIZ YI). 
ZEVB 


The inequality > follows immediately from formula (2) in §4. To obtain 
the opposite inequality, we write two obvious corollaries of formula (IIT) in 
Lemma. 5.1: 


|X =| 
|X=Y| 


IZEX|VIIZEY|, 


l<| 
|<||ZeX|'VZeYI), 


< 
< 


and we take the intersection over all Z. The proposition is proved. 


We note that formula (2) implies the following general extensionality prop- 
erty: for all X,Y,Z€ V®, 


|X =YIAl¥ € Z|] =X = YVIAIX € ZI. 
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5.3. Corollary. The axioms of equality in L1Set are “true.” 

In fact (see Proposition 4.6 in Chapter II), the axioms of equality in our 
case consist of the “true” formula « = x, the axiom of extensionality (in the 
form « = y => (P(x) => P(y)) with P(x) = z € x), and the “true” formula 
c=y>(ae€z> y € 2) (in which P(x) = a € 2), since the only atomic 
formulas P(x) in LSet are z € x and x € z. 


5.4. Remark. In most computations, we shall need to know only the values of 
||X € Y]] and ||X = Yj, and not the precise definition of the objects X and Y. 
In this connection, we note that the following two binary relations on V? 
coincide (as easily follows from (III) and the axiom of extensionality): 


(a) |X =Y] =1, 
()VZEV®?, |ZeEX|=||ZeY|. 


We shall call such X and Y equivalent and write X ~ Y. 


6 The Axioms of Pairing, Union, Power Set, and 
Regularity Are “True” 


6.1. The computations in the previous section show that the basic work in 
ensuring that the axiom of extensionality is “true” was already incorporated 
into the definition of the universe V?. The explicit formulas for recursively 
computing ||X € Y|| and ||X = Y|| reflected so many special properties of 
inclusion and equality that together they guaranteed that the general axiom 
must hold. 

In order to verify several of the other axioms, we must essentially define in 
V® analogues of certain operations in V, such as forming the unordered pair 
and the set of subsets. These operations can be defined by means of formulas in 
L,Set. However, recall that if P(a) is a formula with one free variable x, then 
the a§ € V for which P(zx)(€) is true generally form a class and not a set. 

It will be convenient to introduce the auxiliary notion of a “random class” 
in V®. Using this concept, we shall often construct the operations in V? in two 
stages: the value of the operation will at first be a random class, which we then 
“identify” with a random set using a separate argument. 


6.2. Definition. 


(a) A random class is any function W on V? with values in B that satisfies the 
following extensionality condition: 


W(X) A|X=Y|=WY)AX=Y], for all X,Y eVv?. 
(b) A random class W is said to be equivalent to a random set Z € V¥ (written 


W~ Z)if 
W(X) =||X €Z||, for all X eV”. 
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6.3. EXAMPLES AND REMARKS 


(a) For any random set Z the function X + ||X € Z| is extensional by (II), 
85, and so is a random class. By analogy, we often write ||X € W]| instead of 
W(X) if W is any random class. 

(b) There exist random classes that are not equivalent to random sets. One 
such example is the “universal” random class W(X) = 1 for all X. (If W were 
a set, we would have ||W € W|| = 1, contradicting the regularity axiom, which 
will be shown to be “true” below.) 

(c) Let W be a random class, and let a be any ordinal. We define the element 
W. € V2.1 as follows: D(W.) = V.2,W. = the restriction of W to VP (as a 
function; see 4.2). It is easy to see that for all X €¢ V? we have 


|X € Wall < |X € WI. (1) 


In fact, let U € V2 and X € V®. We then have 


|X = U|| AWa(U) = |X = U||_ AW(U) = |X =U] AW(X) < W(X), 


bs 


so that by (6), 84, 


IXeWall= VV IX =Ul|AWa(U) < W(X) = |X ew]. 
UeVS 


We shall often show that some class W in which we are interested is equiv- 
alent to a set by finding an ordinal a such that W ~ Wg. It is clear from (1) 
that this follows if ||X ¢ W|| < ||X € W4|| for all X. 

(d) Let W, W1, and W2 be random classes. Then W’, Wi A Wo, and W1 V Wo 
are also random classes, since the extensionality condition is trivially verified 
for these functions. We shall write W; 1 W2 and W, U W2 instead of W1 A We 
and W, V W2, respectively. 

(e) Let W be a random class, and let X be a random set. We show that 
Wm X is equivalent to a random set. More precisely, if D(X) = V8, then 
WX ~ (Wr X)q. In fact, for any Y € V® it follows by (6), §4, that 


IyeWnaX)ll= VU =i ale Wr X)all 
UeVvs 
= VY (JU=YI AU Ee WI) Alu e xl 
UeVvs 
= VY U=YIAlY ewllalue xl 
UeVS 


IY eWl Aly eX =l|¥ewnx}. 


This result implies that the separation axioms are “true” (see Section 4.9(b) of 
Chapter IT). 
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The following proposition gives a general method for constructing random 
classes. 


6.4. Proposition. Let P(x, y1,...,Yn) be a formula that does not contain any 
free variables besides x,y1,...,Yn. Let Y1,...,Yn € V® be fixed. Then the 
function 


X > W(X) = ||P(XN,---, Yn 
is a random class. 


Intuitively, W contains every set X with probability equal to the probability 
that P(X,..., Yn) is true. Yi,...,Y, play the role of “constants.” 
PROOF. We use the “truth” of the following axiom of equality: 


ae Von Vin = OS (PO iy Gta) S PO uate) = 1 


If we take a point € in the interpretation class that assigns to 7, y,Y1,---,Yn 
the values X,Y,Y1,..., Yn, respectively, then we find that 


IX =YI < ||POGY,-. Ya VIIPC Yi Yadlls 


or 
|X =Y||AW(X) < WY), 


so that W is extensional. 


We are now ready to verify the axioms. 


6.5. Proposition. The ariom of pairing 


Vu Vw da Ve(z ea ez=uVz=w) 


is “true.” 


ProoF. By definition we have 


Vu Vw da Ve(z Eu &z=uVz=vw)|| 
=NAV AlZ€% #2Z=UVZ=WI. 
UWX Z 
Hence to prove the theorem if suffices if for any U, W € V®, we find an X € V? 
such that for all Z EV, 


|Z € X|| = |4 =U] v |Z = WI. (2) 


For fixed U and W we consider the right side of (2) as a function of Z. This 
function is a random class X by Proposition 6.4, since it corresponds to the 
formula z = U V z = W. We show that it is equivalent to a random set; more 
precisely, if U,W € V.2, then X ~ X,. By the remark at the end of 6.3(c), it 
suffices to verify that for all Z 


|Z € X]| < [4 € Xa. 
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But since ||U € Xq|| = 1, it follows by formula (II) in §5 that 
|Z =U] < ||Z € Xa, 


and similarly 
|Z =W|| <2 € Xall, 


which gives the required inequality. 


6.6. Proposition. The axiom of union 


Va dy Vu(az(uezAzer) Suey) 


is “true.” 


PROoF. We fix X € V® and construct a random set Y such that for allU € V®, 


IVEY =|BeUezrnze X= VY Ue Z| allzZe XI. 
ZEVB 


By Proposition 6.4, there exists a random class Y with this property. We show 
that if D(X) = V2, then Y ~ Y,. Since D(Y,) = D(X), we have 


Ie Yal= Yo WU =ZAllZ€ Yall 
ZED(X) 
= Vi w=zia( V izeZllalaexi). ©) 
ZED(X) Z1EV8 


We show that the inner sum in (3) may be taken only over Z, € D(X). In fact, 
for any Z}, 


|AEX|= Yo WZ= All A||Z.€ XI), 
Z2€D(X) 
so that 
IZEAlAlZAe X= Vo IZeAlAlZ.= All A||Z2 € XI 
Z2€D(X) 
< Vo IZe ZA Z2€X|]. (4) 
Z2€D(X) 


Taking this into account, in (3) we first sum over Z for fixed Z, € D(X). 
Since D(Z,) < D(X), the sum over Z € D(X) coincides with the sum over 
Z € D(Z;), and is equal to ||U € Z||. Thus, 


IWeKI= Yo We Alarlzae€ XI 
Z,€D(X) 
< VY We AAA € X= YI, 
Z1€V8 


by (4). 
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6.7. Proposition. The power set axiom 


Va dy Vz(z Cu@zey) 


is “true.” (Recall that z C x is abbreviated notation for Vu(u € z > u € 2).) 
PROOF. We fix X € V® and construct a Y € V® such that for all Z € V®, 


IZeV="ZcxXl= A We2l'vilue XI. 
UEeVve 
By Proposition 6.4, the right side defines Y as a random class. We show that if 
D(X) =V.%, then Y ~ Yo41. 
We first construct the element Z, € V'., by considering Z as a random 
class. By (1) we have ||[U € Z,||' > ||U € Z||, so that 


|Z €Y|| < Za € ¥|| = ||Za € Yo+all- (5) 
If we prove the inequality 
IZ E¥|| < 120 = Z|, (6) 
it will immediately follow from (5) and (6) that Y ~ Y,41, since by (11), §5, 
|Z © VI] < Zo € Yaar A Za = ZIl <2 € Yash 


It remains to verify (6). 
First let U € D(X) = V2. Then ||U € Z,|| = ||U € Z||, so that ||U € Za e 
U € Z|’ =0, and a fortiori 


IU EX AIU € Za SUEZ! =0. (7) 


As U varies, the left side of (7) determines a random class of the form X NW, 
where W corresponds to the formula —(u € Z, = u € Z). Since D(X) = V,2, 
it follows by 6.3(c) that X NW ~ (X NW),. But according to (7), (X NW). 
is the zero function on V2. Thus, ||U € XN W|| = 0 for all U € V®. Conse- 
quently, 


IU EX <|UE Za SUEZ forall UV. (8) 


To prove (6), we now write the left-and right-hand sides separately (using the 
“truth” of the formula Z, = Z = Vu(u € Za = u € Z)): 


IZeVl= A WWe2l'vive XI, 
Ueve 

Za=Z= JA WWE ZeUEZl. 
Ueve® 
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It is now clear that the inequality in (6) holds term by term. In fact, for ||U € X’| 
this follows from (8), and for ||U € Z|’ it follows because 


IV EZ HU E S| = (l|U € Zall’ Vv ||U € Z|) A (IU € Zall VU € ZI’) 


and ||U € Z||' < ||U € Z,||’ for all U. 


6.8. Proposition. The regularity axiom 


Va (dy (y € x) > ay (ye tAyNa =8)) 
is “true.” 
PROooF. We fix X € V®. The axiom with the “constant” X in place of # has 


the form R = S. We must show that ||/R => S|| = 1. It suffices to prove that 
|| RI] A |S’ = 0, where 


IRI= Vo ily eX, (9) 
Yev® 
Isi'= A irexity( V Ize vialze xi). (10) 
Yev® ZEVB 
We suppose that || R||A||.S||/ = a 4 0, and show that this leads to a contradiction. 


It follows from (9) and (10) that there exists a Y € V® such that ||Y € X|| A 
a #0. We choose Y to have the least rank of any element with this property. 
It is again clear from (9) and (10) that 


IYeX|Aax VV ZeYIAllZeX|l. 
ZEVe 


On the right we may sum only over Z € D(Y), without changing the value of 
the sum. Hence, there must exist a Z € D(Y) such that 


IZEXIAIY Ee XIA a £0, 


so that ||Z € X|| Aa 4 0. But the rank of Z is less than the rank of Y, 
contradicting the choice of Y. 


7 The Axioms of Infinity, Replacement, and 
Choice Are “True” 


7.1. We begin this section by describing two more methods for constructing 
random sets. The first of them, which is very widely used, solves the following 
problem. Suppose we are given a set of objects X; € V?,i € I, and a set of 
elements a; € B. We would like to construct a random set X that contains each 
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X; with probability a;, but such an X might not exist. However, it turns out 
that there always exists an X with ||X; € X|| > a; for all i € I; moreover, there 
exists a least X with this property. 


7.2. Lemma. 


(a) Under the conditions in 7.1, the function X of Y 


Ivy eX] = Va ally € Xl (1) 
el 
is a random class X that is equivalent to a random set. In addition, 
|X; € X|| > aj, and if X’ is any random class such that ||X;, € X'|| > a; for 
each i, then ||Y € X'|| > ||Y € X|| for all Y. 
We shall say that X (or the equivalent random set) collects the X; with 
probabilities a,;. 


(b) Under the same conditions, the function Z of Y 
Iv € Z| =a ally € Xill (2) 


a 
is a random class Z that is equivalent to a random set. If we also have 
a; \a; =0 for alli 4 j, then ||Z = X;|| > a;, and for any random class 
Z' such that ||Z' = X;|| > a; for each i, we have ||Y € Z'|| > ||Y € Z| 
for all Y. 


We shall say that Z glues together the X; with probabilities aj. 


PROOF. It is easily verified that the functions Z and X defined by formulas (1) 
and (2) are extensional. 

There exists an ordinal a such that X; € V,? for alli. We show that X ~ Xq 
and Z ~ Z,,. For any Y € V® we have 


IY eXall= YoY =U Xall 


UeVe 

= V Vily=UllAaalu = ,| 
UEVe i 

= V Vaal =Xill alu = Xill. 
UEVE i 


If we consider the term with U = X; on the right, we obtain a; A ||Y = X;|| < 
||Y € XQ||, so that ||Y € X]| < ||¥ © XQ|| by (1), and the assertion follows 
by 6.3(c). 
Similarly, for any Y € V? we have 
Vo VI Hull Aaa lu € Xl 
UEeVe 1 
= V VaallY € Xi Aly =U. 
UeVe 1% 


IY € Zall 
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Since ||Y € Xl = VyevallY = Ul A ||Y € ill, it follows that 
a; AY € Xai] < ||Y € Zall, and |¥ € Z| < ||¥ € Za|| by (2). 

Now let X’ and Z’ be any random sets with the properties in (a) and (b). 
It is clear from (1) that ||X; € X|| > aj. If ||X; € X’|| > aj, for each 7, then 
[¥ € X"|=Voll¥ =U AU € xX’ > ViIIY = Xall A [Xa € X"| 2 IY € A 


by (1). 
Similarly, if a; Aa; = 0 for i # 7 then it is clear from (2) that 
a; A \|Y € Z| =a; A||Y € Xj], so that 


a, N||Xi = Z| = \f ai AY € Xi @ Y € Z| = a4 
Y 


and ||X; = Z|| > a;. Now if ||X; = Z’|| > a; for each i, then 


IY eZ > IV € ZI AIZ = Xi 
=||Y eX AIZ = Xi) > acAlY € Xill, 


so that ||Y € Z| > |lY € Z|). 


Here is our first application of Lemma 7.2(a): 


7.3. Proposition. The axiom of infinity 


da(@ ex AVu(u € x > {u} € 2)) 


is “true.” 


Proor. When we proved that the axiom of pairing is “true,” we constructed 
for any U,W ¢€ V® an element Z € V® (unique up to equivalence) with the 
property that ||Y € Z|| = ||Y =U VY = W]|| for all Y. It is natural to let 
{U,W} denote this element Z, and let {U}? = {U,U}¥. 

We now verify the axiom of infinity. We set X9 = @, X; = {@}¥,..., Xn = 
{X,_-1}",.... Further, we let X € V® be the element that collects all the X; 
with probabilities 1. We show that 


|e eX AVu(ue X > {u} € X)] =1. 


It is obviously sufficient to prove that for all U €¢ V® we have ||U € X|| < 
|{U}? € X|, that is, by (1); 


ee) _ I< ee) B_ I, 
VV gt = Xall < Vee I{ud® = Xu 


In fact, since the formula u = x = {u} = {a} is “true,” and since Xj41 = 
{X;}*, it immediately follows that 


||U = Xi] = {0}? = Xia. 


7.4. Lemma. Let W be a random class. Then there exists an element X € V® 


such that 
VV WW) =W(X). 
UevVv® 
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The left-hand side may be represented in the form ||dr(a € W)|| = 
||W 4 ||. Hence, intuitively, the lemma says that the probability that a given 
class is nonempty coincides with the probability that a suitable element occurs 
in it. 


PROOF. We first show that there exists an ordinal 6 such that Vyeys W(U) = 
Vueve W(U). In fact, let ay = Vuevs W(U), and for any a € B set 
y(a) = min(yla, > a) (or (a) = 0 if a, a for all y). Finally, set 
8 =supgep 7(a). This is an ordinal, because B is a set. If y > 6, then a, > ag 
by monotonicity, but we cannot have a, > ag because of the choice of (3. 
Thus, let Vj,,;W(U) = Vueve W(U). We index all the elements in Ve by 


an initial segment of ordinals (by the axiom of choice!): VP = {Uahaer. We set 


da = W(Ua) A (Vv we) , ae€l, 


y¥<a 


Obviously ag \ay = 0 for a  y. Using Lemma 7.2(b), we glue together the sets 
Uq with probabilities ag(a@ € I). We obtain a set X satisfying the conditions 
||X = U,|| > aa > W(U,,). Using the extensionality of W, we obtain 


W(X) > VV |X =UallAW(Ua) = \V Wa) = Yo Wu). 
acl acl UEeVv® 


7.5. Proposition. The replacement axiom 


=> dw Vy(y ew & Aa(a eur Plz,y, 2)))) 
is “true” (here Z = (z1,.-.;2n)). 
Proor. We fix a “vector” Z = (Zj,...,Zn) with Z; € V® and an element 
U € V®. We shall write P(z,y) instead of P(zx,y,Z). If we write the axiom 
i 


with the “constants” Z; and n the form R => S, then we must prove that 
|R => S|| =1. 


7.6. The special case: If ||R|| = 1, then ||S'|] = 1. 

We first show how the general case follows from this special case. Let a € B, 
and let B, denote the set {b € Bib < a}. The operations on B induce a 
Boolean algebra structure on B, with unit element 1, = a. The natural mapping 
B—- By: b+ bAa is a homomorphism. An easy induction on a allows us to 
construct a surjective map of universes V? — V?« : X +> X, such that for all 
X,Y € V® we have 


Xa € Yall =X € YI] Aa, 
||Xa = Yal| = ||X =Y]| Aa. 
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Now, to prove Proposition 7.5 from the special case 7.6, we choose a = ||R||. 
Then || R||a = 1a, so that 7.6 implies that |||, = 1a. This means that ||S'|| > a, 
and hence ||R = S|| = 1. (Here we have used 7.6 in V+; clearly ||Rlla = ||Rall, 
where R, is the obvious image of R in V?:.) 


7.7. PROOF OF 7.6. The condition ||R|| = 1 means that for any X € V®, 
|X € Ul] < ||Aly P(X, y)IL- (3) 


To show that ||,S'|| = 1, it is sufficient if given U ¢ V®, we finda W € V® such 
that for all Y « V®, 


Iyewl= Vo IX eu allP%,Y)I. (4) 
XeEVB 
It follows from 6.5 that the formula (4) defines W as a random class. We find 
an ordinal a such that W ~ W,. 
To do this, we first note that in (4) we may take the sum only over 


IYewl= Vo xX eUlllP(x%,Y)| (5) 
XED(U) 


(the argument here is the same as after formula (3) in §6). We now apply 
Lemma 7.4 to the class Wx(Y) = ||P(X,Y)||. It follows that for every X € 
D(U) there exists an element Yx € V® such that 


[Fy P(X, y)ll = ||P, ¥x) I). (6) 


(Because ||S!ly P(X,y)|| < |/dyP(X,y)|], we can use these Yx to estimate 
||X € U]| with the help of (9) below.) 
We set ax = min(alYx € VP), and 


a =sup(ax|X € D(U)), 


and then show that W ~ W, for this a. We must verify that ||Y € W|| < 
||Y € W,|| for every Y. By (5) and by formula (II) in 85, this follows if for any 
X € D(U) we have 


|X € UA ||P(X,Y)I < [¥ = ¥xl A ¥x € Wall. (7) 
In the first place, by (3), (6), (5), and the definition of a, we have 


|X €U| < ||P(X, Yx)I}, (8) 
|X €U|| < ||¥x € W|| = ||¥x € Wall. 

Further, we consider the following formula, which is “true” because it is 
deducible from the logical axioms and the axioms of equality: 


Va(Aly P(x, y) A P(x, y1) A P(x, y2) > y1 = Y2)- 
We thereby obtain 


Aly P(X, y)I] A P(X, VILA PX, ¥x)I < Y= ¥xll- (9) 
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Finally, it follows from (3), (8), and (9) that 
|X € Ul] A |P(X, YI < [T¥ = ¥x| A |]¥x € Wall, 


ie., we have (7). 
7.8. Proposition. The axiom of choice is “true.” 


ProoF. Recall that the axiom of choice has the form Va Jy((QA RAS AT), 
where 


Q denotes Vz(z € y > Judu(z = (u,w)))(*y is a binary relation”); 
Rdenotes VuVwiV wo((u,wi) € yA (u, we) € y > wi = w2)(“yis a 
function” ); 


S denotes Vu(dw((u,w) € y) > u € #)(“the domain of definition of y 
is contained in x”); 

T denotes Vu(usA SB Auer => saw(w eu (u,w) € y))(“the domain 
of definition of y coincides with x, and y chooses one element 


from each nonempty element of 2”). 


We fix X € V® and construct the corresponding “choosing function” Y. To 
do this: 


(a) We index D(X) by an initial segment of ordinals: 
INA) = 4 Up Uigss ne agw}, ael, 


(b) For each U, € D(X) we use Lemma 7.4 to find an element W, € V? 
such that 
[Wa eUall= Yo IW Vall. 
Wev® 
(c) For each a € I we set 


Qa = Ua Ee X\|A | VV Ue € XII’ Vv ||Us = Vall’ 
B<a 


(d) Finally, we let Y denote the set that collects the “ordered pairs” 
(Ua,Wa)? with probabilities ag,a E I. Here, of course, 
(U,W)? = {{U}?, {U, W}7}. 


The idea of this construction is as follows. In each U, we choose the element 
W, that belongs to U, “with the largest possible probability.” We then put 
together the graph of the choice function Y from the “pairs” (Ua,Wa)®, 
where we take the pairs in the order they are indexed, but include a given 
(Ua,Wa)® only to the extent that U, “was not already considered earlier as 
belonging to X.” 

We now substitute X and Y in place of x and y in the axiom of choice, 
and, letting Q, R,S, and T now denote the corresponding formulas with these 
constants, we show that ||Q|| = ||R|| = ||S|| = ||T|]| = 1. We shall constantly be 
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using the following formula, which follows from (1) and the definition of Y; 


|Z € Yl] = V2 = Wa, Wa)? || Aaa. (10) 


7.9. ||Q|| = 1. By the definition of Q, this means that for all Z ¢ V? we must 
have 
|Z €¥|| = V |Z = (U, W)* I, 


but this is obvious from (10). 
7.10. ||R|| = 1. By the definition of R, for any U,W!,W? € V® we must prove 
the inequality 
O, Why? EIA, W*)? EY] < ||Wt = Wel]. 
Using (10), we rewrite the left-hand side in the form 


VV |lU = Vall A Wt = Wall Aaa A |U = Upl| A |W? = Wall A az. 
0,8 


Since ||U = U,|| A ||U = Ual| < ||Ua = Ugl| and ||U2 = Ugl| Aaa A ag = 0 
for a # @ (see the definition of aq), it follows that in this sum we need 
only consider the terms with a = 3. But such a term is < ||Wt = Wall A 
|W? = Wa|| < ||W! = W? ||, as required. 


7.11. |||] = 1. This is equivalent to the inequality 
(0, W)? eY|| <0 € XI. 
But by (10), the left-hand side equals 


\/ ||U = Vall A |W = Wall Ada < \f ||U = Vall A |W = Wall A [Ua € X|| 


< VY ||U = Vall A ||Va € X|| = || € XI]. 


a 


7.12. ||T'|| = 1. We must prove that for any U € V®, 


VE XIAN Aa < Vo WevUIAl\U,w)? € YI. (11) 
Weve 


We first show that it suffices to prove (11) for U € D(X), ie., for all Ua, a € I. 
In fact, suppose (11) holds for all U,. Then for U € V? we have 


|U € X|] = \f WU = Vall A [Uo € XI), 


a 


|UZA a= Vo Wed, 


U,EVe 
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and hence 
IVE X| AUF S= VV U1 € UAV = Vall Al|Uo € XI 
a, Uy 


VV ||U1 € Vall A JU = Vall A (Ua € XI 
a, Uy 


IN 


(by (IIT) in 85) 
VV [Va # Il A |!U = Vall A [Ua € X| 


IN 


Vo We Ul AI Ua, WY? € YI A ||U = Val 
a,weve 


(by (11) for U,.) 


< VIIW e UI Alu, W)? € YI. 
w 
(Here we used the fact that 


I|(Ua, W)? E YA ||U = Vall 
=\J [Ua = Vall A ||W = Wall Aag A\|U = Vall 
B 


< VV llU = Vall A |W = Wall Aa 
B 


= ||(U, W)? € Y|I.) 
Thus, it remains to prove (11) for Ua,a € I. Now 
[|Va # BI) = ||Sw(w € Ua)ll = \V IW € Vall = ||Wa € Vall. 
Hence (11) can be rewritten " 
[Va € XI] A Wa € Vall < VW e UallAl\(Ua,W)? € YI. (12) 
w 


We prove this by induction on a. (12) is obvious for a = 0, since the term on 
the right with W = Wo coincides with the left-hand side. Suppose (12) holds 
for B<a. 

By the definition of a,, we have 


[Va € XI] =aaVv { VY [Ue € XI) A ||Us = Vall 
B<a 
If we substitute this formula in the left-hand side of (12), we find that we must 
prove two inequalities: 


da A |[Wa € Vall < \/ |W € Vall A ||(Ua, WY? € YI, (13) 
w 
[Ue € XI] A ||Ue = Vall A ||Wa € Vall 


< Vio, W)P EY AW EUal, forall B<a. (14) 
Ww 
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The inequality (13) is obvious if we look at the term on the right with 
W =W,,. The inequality (14) reduces to the induction assumption as follows. 
The left-hand side of (14) is 


< ||Ua € X|| A |!U 5 = Vall A ||Wa € Vall 
< ||Ue € XI A ||Ug = Vall A ||We € Vall 


by the definition of Wg. Further, using the induction assumption and exten- 
sionality, we have 


|[Ua = Vall A ||!Us € X|| A [We € Vall 


< VV |W € Uall A (Us, WY? € YL A ||Up = Vall 
Ww 

<VIW Ee UallA (Ua, W)? € YI, 
Ww 


which completes the verification of the axiom of choice. 


8 The Continuum Hypothesis Is “False” for Suitable B 


8.1. We recall (Lemma 7.2(a)) that the set X € V® collects the sets {X;} with 
probabilities a; € B(i € I) if ||Y € X|| = V;|[Y = Xil| Aa: for all Y. Using 
this definition, we can introduce a useful canonical mapping t + ¢ from the von 
Neumann universe V to the universe V?. Let @ = @ (recall that ||Y € @|| =0 
for all Y), and if § has already been defined for all s € Vy, then for t € Va41, 
we let t collect all the § for s € t with probability 1. In other words, for any 
Yev®, 

Iv e@] = Vly =]. (1) 

set 

(Here the collecting set ¢ is not uniquely defined, i.e., it is defined only modulo 
equivalence, so that, strictly speaking, we should also specify the rank of t, for 
example by saying that it equals the rank of t. This is not essential for us, 
however, since we shall be interested only in the truth functions, which do not 
change if we replace an object by an equivalent object.) 

We now formulate some additional conditions (besides completeness) that 
must be imposed on the Boolean algebra B for the purposes of this section. 
Recall that wo is the first infinite ordinal, w; is the first ordinal having 
cardinality > wo, and wy is the first ordinal having cardinality > w. 


8.2. Conditions on B. 


(a) The countable chain condition, which, we recall, says that if we have a 
family of elements {a;},i € I, such that at a; £0 and a; \a; =0 fori FJ, 
then I is at most a countable set. 

(b) There exists a family of elements b(n, a) € B, indexed by the set wo x we, 
with the following property: if Z(a) collects the elements n,n € wo, with 
probabilities b(n, a), then ||Z(a) = Z(8)|| = 0 for a4 B,a, 8 € we. 
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The second condition has the following intuitive meaning. It is easy to see 
that ||Z(a@) C @o|| = 1. In fact, this equality is equivalent to ||Va(x € Z(a) > 
x € W&o)|| = 1, ie., to 

VX EV?, |X Za) < |X € aol, 
and this is obvious from (1), since @p collects the # with probability 1, and 
Z(a) collects the # with probabilities b(n, a) < 1. 

Thus, condition (b) means that we can find w2 distinct subsets Z(a@) C Wo, 
so that, in the naive sense, we have card P(&o) > w1. This is precisely the 
negation of the continuum hypothesis. Of course, it is still necessary to show 
that this intuitive idea can be made into a proof. 


8.3. The existence of B with the required properties. We could use measurable 
sets, as in §3. However, in order to vary our approach, and to prepare for 89, 
we give another construction. Let {0,1} be the discrete two-point space, let 
I = wo X we, and let S = {0,1}! be the space of vectors whose coordinates 
are indexed by J and take the values 0 or 1. We introduce the direct product 
topology on S. It has a standard basis of open sets consisting of all vectors 
whose coordinates indexed by a finite subset 7 C I are fixed. 
If a C S, we set 


a =the complement of the closure of a in S, 


and we set a” = (a). Sets a C S with a” = a are called regular open sets in S. 


8.4.Theorem. Let 


B={ac Sa’ =a}, 
a\b=anb, 
aVb=(aUb)”. 
Then B with the operations A, V, and "is a complete Boolean algebra with the 


countable chain condition, and \/; a; = (Uia;)” for any family of a; € B. 


We omit the proof (see J. B. Rosser, Simplified Independence Proofs, 
Academic Press, New York, 1969, Chapter 2). 


8.5. Lemma. Under the conditions in 8.4, let 
b(n, a) = the set of vectors with 1 in the (n,a) place, 
and let Z(a) be defined as in 8.2(b). Then 
|Z(a) = 2(8)|| =0, for aF p. 


ProoF. By formula (5) in §4, we have 


|Z(a) = 2(8)| = /\ O(n, a) v b(n, B)) A (b(n, a) AB(n, 8) ). 


nNewo 
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The right side can become larger only if we replace A by M and V by U; 
here the primes " coincide with the ordinary complements. If we had 
||Z(a) = Z(3)|| A 0, then there would exist an element X in the standard 
basis of the the topology (see the beginning of 8.3) that is contained in 


() (0(n, a) N.d(n, 8)) U (B(n, a) 9 b(n, B)’). 
nNnEewo 
But this intersection consists of all vectors having the same (n, a)-coordinate 
and (n, 3)-coordinate for all n, while all coordinates except for a finite number 
range freely in any element X of the standard basis of the topology. 


8.6. Formulation of the negation of the continuum hypothesis. We shall prove 
that the following is “true”: 


Va((“a is an ordinal” A “ais not finite’ A Vy(y € « => “y 


—CH : is finite”)) > Jw(“there is no function from x onto all of 


w” A “there is no function from w onto P(x)” )). 
Here: 
x is finite: Vy(y¥ CuAy#u => “there is no function from y onto all of x”). 


We leave the translation of the other abbreviated notation to the reader. 

The premise in — CH says that “x is the first infinite ordinal,” and the 
conclusion says that “w is a set having cardinality intermediate between that 
of x and that of P(a).” We shall abbreviate — CH as follows: 


Va(P(a) = Sw(Qi(x, w) A Qa(x, w))). (2) 


8.7. Reduction Lemma. Let P(x) and Q(x) be two formulas in the 
Zermelo—Fraenkel language having one free variable x and satisfying the 
properties 


The formula Ala P(x) is deducible from the axioms, and 


Xo €V® is an element such that ||P(Xo)|| = 1. 


Then ||P(X)|| = ||X = Xoll| for all X, and if ||Q(Xo)|| = 1, it follows that 

[VeP(X) => Q(X))|| =1. 

PRooF. We first note that |Sa P(«)|| > ||d!a P(X)|| = 1, since all the axioms 

are “true” in V®, and the rules of deduction preserve “truth.” It hence follows 

from Lemma 7.4 that there exists an object Xo € V? with ||P(Xo)|| = 1. 
Further, P(a) A P(y) > x = y is also deducible, so that if we apply this 

with X in place of x and Xo in place of y, we find that 


|P(X)I| < |X = Xoll (3) 


But ||P(X)I| A |X = Xoll = P(Xo) A||X = Xoll = |X = Xoll. Hence the 
inequality in (3) may be replaced by equality. 
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Finally, we suppose that ||Q(Xo)|| = 1. Then, by what was just proved, 


I|P(X)Il = lO(Xo)| A IX = Xoll = Q(X) || A |X = Xoll 
= [OQ XD APCD 


so that ||P(X)|| < ||Q(X)||, and Va(P(x) = ||Q(X))|| = 1. 


This lemma can be applied to =CH in the form (2), since the formula 
dla P(X), where P(x) is the premise “x is the first infinite ordinal,” is 
deducible from the axioms. We shall not give this formal deduction, and shall 
consider the uniqueness of wo to be common knowledge. Now, by Lemma 8.7, 
to verify ~CH it suffices to prove the following facts: 


8.8. || P(@o)|| = 1. (In other words, &o plays the role of Xo in our situation.) 
8.9. ||Q1(@o, @1|| = 1. 


8.10. ||Q2(@o, @1|| = 1. (This then implies that ||5w(Q1(@o,w) \Q2(@o, w))|| = 1, 
and completes the verification of the conditions of the lemma.) 8.8. is verified 
almost mechanically, and we leave it as an exercise. 


8.11. VERIFICATION OF 8.9. We must show that if B satisfies the countable 
chain condition, then 


J a function from @p onto all of @;|| = 0. 


The proof that follows carries over word for word to the more general case, 
when instead of wo and w1, we take any pair s,t © V such that card s < card t 
and card s is infinite. 

We suppose that 


0Aa=|lAf(f is a function AVy(y € @1 > Ja(a € &o A (x,y) € f)))II, 


and we show that this leads to a contradiction. There must exist an F € V? 


such that 
a < ||F is a function || A (A) 
Y 


For every @ € w1, we consider the term in /\,---- corresponding to Y = @ and 
use the fact that ||@ € @,|| = 1. We obtain 


a < ||F is a function|| A (Vv |X € Wol] A ||(X, a)? € ri) (4) 
x 


By (1), we have 


IX € Bol] A(X,a)? € Fl = VV IX =A A(X, 4)? € Fl 


n<wo 


= VV IX =AllA||(A, a)? € FI, 


n<wo 
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so that if we sum first over X and then over n, we may write (4) in the form 
a < ||F is a function|| A ( VV (A, a)? € Fil). 
n<wo 
Hence, for every a < w, there is an n(a) < wo such that 
||F is a function|| A ||(n(a), a)? € FI] 40. 
Then there exist an mp and a subset 7 C wy of cardinality w; such that 
0A aq = ||F is a function|| A ||(i0, @)? € Fil, for alla e 7. 


It remains to show that ag \ag = 0 for a  @, which contradicts the countable 
chain condition on B. Now by the definition of a function 


dq \ ag = ||F is a function] A ||(ft0,@)? € Fl A || (fio, B)? € Fl| < |@= All, 


so that it suffices to show that a 4 @ implies ||@ = || = 0. 

In fact, if, say, 7 € a but y ¢ §, then the formula (5) in §4 for ||@ = (| has 
a zero term, namely || € Jl’ V ||/7 = ||. (To check that ||7 = 4|| = 0 if y ZB 
we have to know that: ||4 = 4|| = 0 if y 4 6, but we have to know this only for 
y and 6 of lower rank than a and £3, so that the detailed proof uses induction 
on the rank.) 


8.12. VERIFICATION OF 8.10. We must show that 


4a function from 1, onto P(wpo)|| = 0, 


that is, that 


I|39 (g is function A Vz(z C @o = Jy(y € 1 A (y, z) € g)))|| =0. 
Suppose that for some G € V? we have 
0#£a=||G is a function || A (A--:): 
Z 

For every @ < w2 we consider the term corresponding to Z = Z(a) (see the 
definition in 8.2 and 8.5), and we use the fact that 

04a < |G is a function|| A (Vv VY € yl] A |(¥, Z(a))? € cll). (5) 

Y 

By (1), we have 


IY EQ AMY, Za)? eG = VV IY =I AMY, 2)? € G| 
B<wy 


= V IV =BIAIG, Ze)? € GI. 


B<wy 
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Summing first over Y, we rewrite (5) in the form 


04a<||Gisa function] A \V |\(8,2Z(a))? € GI. 
B<wy 


Hence, for every a < wg there is a G(a) < w; such that 
04 aq = ||G is a function|| A ||(G(a), Z(a))? € GI. 
Then there exist a G9 < w ; and a subset 7 C we of cardinality w2 such that 
0 4 aq = ||G is a function|| A ||(G0, Z(a))? € Gl, for alla J. 


As in 8.11, we obtain a contradiction to the countable chain condition if we 
show that aq A ag = 0 for a £ 3. But this follows from 


da Nag < \|Z(a) = 2()|| = 0 


by Lemma 8.5. 


9 Forcing 


9.1. By choosing the Boolean algebra B in various ways, one can use the cor- 
responding models V? to show that many different assertions P are consistent 
with the Zermelo—Fraenkel axioms. But each choice of B for a given P such 
that||P|| = 1 in V® presents a separate problem. 

There is another interpretation of this method that is closer to Cohen’s 
original idea. From this point of view we start not with a universe V and a 
Boolean algebra B, but with an (often countable) transitive model M and an 
ordered set C' of “forcing conditions.” It is usually more obvious how to choose a 
suitable C’ than how to choose a suitable B for proving that a given proposition 
P is consistent. One might say that B embodies the “physical meaning” of the 
problem, while C expresses its “logical meaning.” Anyway, it is not difficult to 
go from one version to the other, and in either case it takes about the same 
amount of work to verify the “truth” of the axioms. 

In this section we discuss the second version, using forcing, with most of the 
proofs omitted. The details can be found in Cohen’s original article, and also 
in Jech’s book Lectures in Set Theory with Particular Emphasis on the Method 
of Forcing, Springer-Verlag Lecture Notes in Mathematics 217, 1971, and in 
J. R. Shoenfield’s article “Unramified forcing,” Proc. Symp. in Pure Math., vol. 
13, 1, 357-381 (American Math. Soc, Providence, 1972). 


9.2. Before introducing the general concept of forcing, we consider a special case 
that arises in a typical problem. 

Let X and Y be two sets, for example P(wo) and w2. We consider the 
proposition P : “card X > card Y,” which in this special case is the negation 


146 III The Continuum Problem and Forcing 


of the CH. One possible approach to constructing a model (in the usual rather 
than Boolean sense) of L;Set in which P is true is as follows. 

We take our original countable transitive model M of set theory (i.e., of the 
special axioms of L;Set), which was shown to exist in §7 of Chapter II. Let Xj, 
and Yy be the “representatives” of X and Y in M. (This means that if, say, 
X is defined by the formula 3! 2 P(x), then Xj = x, where € is a point of 
the interpretation class for which |P(x)|ar(€) = 1; see §7 of Chapter II.) We 
assume that Xj, is infinite and Yyy is nonempty. Then “from an external point 
of view” Xj, is countable and Yj, is at most countable, so there automatically 
exists a function F' that maps Xj, onto all of Yyy. A natural idea would be to 
add (the graph of) F to M, i.e., to consider the least countable model N of the 
axioms that contains M and F. Then N has a map from X jy onto Yaz, but it 
is very likely that Xj 4 Xyy and Yy 4 Yyy. What we need in N is a map from 
Xy onto Yn. 

As we have shown when discussing Skolem’s paradox in Chapter II, at least 
for certain pairs (such as X = wo, Y = P(wo)), we cannot obtain a map from 
X onto Y in this way. In those cases in which we can construct such a map, we 
must choose F very carefully. Cohen’s idea was that F’, rather than being chosen 
so as to satisfy some conditions, should be chosen so as to avoid reflecting any 
specific properties of MW, i.e., F should be “generic.” We shall formulate this 
more precisely. 

It turns out to be important to start not by choosing F' directly, but by 
choosing the set 


G = {restrictions of F to finite subsets of Xj, }. 


Clearly, F' is uniquely determined from G : F = Ugegg (recall that a function 
is the same as its graph). Hence F is contained in any model that contains G. 
But now we must give an axiomatic characterization of the suitable G without 
using F’ explicitly. Here are the properties that G must satisfy: 


9.3. 


(a) G CC, where C is the set of maps from finite subsets of Xj4 to Yyy. It is 
important that C € M, because the formula in L;Set that defines C' is (IM, V)- 
absolute. We need this remark in order to motivate the general definitions later. 


(b) 2 €G; if pe Gand q € C, where gq C p, then g € G; for any pi, p2 € G 
there is p € G such that p > p; U po. 

Suppose we have chosen such a set G of maps from finite subsets of Xj to 
Yu. Then Ugegg is also a map from some subset of Xjy to Yjs. In order for 
this map to be defined on all of Xa, and to be surjective, it is necessary and 
sufficient for the following additional conditions to hold: 


VZ€ Xm, GN {p € Cp is defined at Z} 4 9, 
VZE Yu, Gn {q€ Clq takes the value Z} 4 ©. 


We call a subset D C C dense in C if for all p € C there is aq € D with 
p Cq. The set of maps p defined at Z and the set of maps q taking the value 
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Z are dense, and, moreover, are elements of M by the same consideration 
of (M,V)-absoluteness. Hence the two requirements at the end of the last 
paragraph are included in the last condition, that G be generic: 


(c) GA W F @ for all dense subsets D C C that are elements of M. 

Although it is not yet evident, it is precisely the condition that G be generic 
that ensures that the properties of the sets Xjy and Yj, will be preserved as 
much as possible after we add G to the model. 

We now define the general concept of “forcing conditions.” 


9.4. Forcing conditions. These are the elements in any partially ordered set 
(C, <) that has a maximal element 1. Usually C and < lie in the original model 
M. 

A set G is called generic over M (relative to C) if the following conditions 
hold: 


(a) GCC; 

(b) 1 € G; if p € G and q € C, where q > p, then q € G; for any pi, po € G 
there is a p € G such that p < p; and p < po; 

(c) GA DF @ for all dense subsets D C C with D € M (D is dense if for all 
péC, there is aq € D with q <p). 


If the reader compares this definition with the special case in 9.3, he or she 
will notice that we have replaced C by > and @ by 1. This is in keeping with 
Cohen’s original point of view, according to which p > q if, when p is considered 
as a “condition” imposed, say, on F', more F’s satisfy p than gq. (Each p fixes 
the restriction of F' to some finite subset of Xy,.) 


9.5. The existence of generic sets. Let M and C be fixed. If M 1 P(C) is count- 
able, then for every p € C there exists a generic set G containing p. 
In fact, we index the elements of MN P(X) as X1, Xo, X3,... and then set 


Pns if pn <q for all g € Xn; 


P1=P,  Pn+1 = es q € X, such that q < pn, otherwise. 


Finally, we set G = {q € ClSn(pn < q)}. 

Conditions (a) and (b) for G to be generic are trivial to verify. Condition 
(c) follows because if D € M and D is dense, then there exist n and gq for which 
D=Xn,¢d€ Xn, and ¢ < pn, so that pyn41 € DAG. 


9.6. The connection with Boolean models. As mentioned before, we have consid- 
erable freedom in our choice of the set C' of forcing conditions and the generic 
subset G C C. Exactly how one “forces” a given proposition P was explained 
briefly in 9.2. We now show how to construct an axiom model M|G] that 
contains M and G, once C' and G have already been chosen. 

The article by Shoenfield gives a direct construction, but we shall make use 
of an analogy with V®, as in Jech’s presentation. In this approach M[G] is 
constructed in three basic steps: 
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(a) Corresponding to the set C we construct a canonical complete Boolean 
algebra B. 

(b) We construct a Boolean universe M? over B that is “relativized” by means 
of M. 

(c) We construct a canonical maximal ideal Ig C B determined by G and the 
“fiber” of the universe M? over the quotient algebra B/Ig = {0,1}. It is 
this fiber that will be the model M[G). 


We now discuss these steps separately and in more detail. 


9.7. Ordered sets and Boolean algebras. Every Boolean algebra B has a canonical 
partial ordering: a < b if aAb =a. All elements of the structure of B are 
uniquely determined by this partial ordering. The induced ordering on B — {0} 
is separable. By definition, this means that if a,b 4 0 and a < b, then there 
exists c < a,c 4 0, such that there is no d £4 0 for which d < b andd < «. (It 
suffices to take c= a/b.) Such b and c are called disjoint. 

Now let C be a fixed partially ordered set. We consider the class of (non- 
strictly) order-preserving maps of C' into different complete Boolean algebras B 
such that 0 is not contained in the image. 


9.8. Proposition. In this class of maps there exists a unique universal map 
e:C— B with the following properties: 


(a) e(c) is the maximal separable ordered quotient set of C such that c1,c2 EC 
are disjoint = e(c),e(c2) € B are disjoint; 
(b) e(c) ts dense in B — {0}. 
B can be realized as the algebra of regular open sets in the space C’ with 
the topology defined by the basis Uc = {a € Cla < ch, cE C. 
Now we can indicate how Jg is constructed from the generic subset G C C: 


G, = {b € BiAp € G,e(p) < d}, 
Ig = B\ Gi. 


It is not hard to prove that Ig is a maximal ideal in B, i.e., the kernel of a 
Boolean homomorphism B — {0,1}. The set G, is precisely the preimage of 
1 under this homomorphism. Since G is generic in C, we have the following 
property of Gy: for any subset A C B such that Vjce,a=1 and a, A az = 0 
whenever a, # dz € A, there exists a unique element a € ANG. 


9.9. The universe M®. This universe is constructed from M and B in exactly the 
same way as V? was constructed from V and B, with one essential difference: 
all constructions are relativized with respect to M. This means that instead of 
B, we take the algebra By that “represents” B in M (see 9.2); only ordinals 
a € M are used in the construction of M2, and so on. A rigorous presentation of 
these constructions would require much more formalization using the expressive 
means in LSet than seems desirable in this section. In such a presentation both 
the general plan and the details of the work would remain essentially the same 
as before. 
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The basic result of these constructions is that to every closed formula P in 
L,Set with constants in M corresponds a Boolean truth value ||P|| € Bas. Here 
the value 1 corresponds to the axioms, and deductions preserve “truth.” 

The next step cuts down the size of M®, again giving a transitive standard 
submodel. 


9.10. Construction of M{G]. For brevity, we shall write B instead of Byy, and 
so on. The construction essentially consists in going from “random” sets 
X,Y € M® to “determined” sets X,Y, where we say that X € Y if the truth 
value ||X € Y|| goes to 1 under the homomorphism B > B/Ig = {0,1}, ie., if 
|X € Y|| € Gy (see 9.8). More precisely, we inductively define 


i(2) = 9, 


and let M[G] denote the image of the map i: M? — V. This notation is jus- 
tified by the following result. Suppose that C' and < belong to M and that the 
subset G C C is generic. 


9.11. Proposition. M[G] is a model for the Zermelo—Fraenkel axioms that con- 
tains M and G. If M is countable, then M[G] is the least such model. 


M[G] contains M for the following reason. If we let X ++ X denote the map 
M — M®? that is constructed as in 8.1, then it is easy to show that X =X. 

M[G] contains G because G = G’ , where G’ is the object in M® that 
collects all the 6,b € B, with probability 1. 

M{G] is an axiom model basically because M¥ is a Boolean axiom model. 
However, here we use in an essential way the assumption that G is generic. 


(Shoenfield verifies this result directly, without using M®.) 


9.12. EXAMPLE. We return to the assertion “card P(wo) > (w)2” in 9.2. By the 
above discussion, to prove that it is consistent with the axioms we choose a 
countable model M and then set 


C = {maps of finite subsets of P(wo) to we}, 
GC C =a generic subset of C. 


If we consider a map from a subset of P(wo) to we as a function from wo x we to 
{0,1}, and if, instead of “relative” constructions in M, we consider “absolute” 
constructions in V, then the Boolean algebra B that we obtain from C' turns 
out to be the same algebra that was constructed in 8.3 and 8.4. This explains 
the appearance of B. The ideal Ig did not play any role in 88 because we were 
not trying to construct a standard model. 


9.13. We conclude with a very general theorem of Easton, which shows how 
little we understand the behavior of the function 2* (k a cardinal). 

Let a be a limit ordinal. Its cofinality cf (a) is the least ordinal @ such that 
a is the union of ( ordinals less than a. An infinite cardinal k is called regular if 
cf(k) = k and is called singular if cf (k) < k. Konig (1905) proved that 
et 2") kh, 
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9.14. Theorem (Easton, 1965). Let F be any (nonstrictly) monotonic function 
on a subclass of the regular cardinals that takes values in the class of cardi- 
nals and that satisfies: cf (Nr(n)) > Xe. Then the assertion “V regular k € 
dom F,2** = Nee)” does not contradict the Zermelo—Fraenkel axioms. 


If the domain of F is a set, Easton’s theorem can be obtained using a model 
of the form M[G], where M is a model in which the generalized continuum 
hypothesis holds (Gédel proved that such an M exists; see the next(chapter). 
If the domain of F' is a class (for example, the class of all regular cardinals), 
the concept of forcing must be generalized to the case that C is a class. 

For singular cardinals «, the following result is known (Silver’s theorem). 

Let « be singular, cf(«) uncountable. Denote by «* the successor cardinal 
to «. If 2%) = + for all infinite cardinals A < «, then 2°) = qt, 


IV 


The Continuum Problem and Constructible Sets 


1 Godel’s Constructible Universe 


1.1. In this section we introduce the subclass L C V—“Gédel’s constructible 
universe”—and establish its fundamental properties. Perhaps the shortest 
description of L is that it is the smallest transitive model of the axioms of L,Set 
that contains all the ordinals. But the working definition of L, from which the 
name “constructible universe” is derived, is rather different. 


We consider the following operations F),..., Fg on sets: 

F(X, Y) = {X,Y}, 

F(X, Y) = X\Y, 

P3(X,Y) =X x Y, 
Fy(X) = {U|AW((U, W) € X)} = dom X, 
F5(X) = {(U,W)|U,W € X; UE WH, 
Fo(X) = {(U1, U2, U3)|(U2, U3,U1) © X}, 
F(X) = {(U1, U2, Ug)|(Us, U2, U1) € X}, 
F3(X) = {(U1, U2, U3)|((1, Us, U2) © X}. 


We say that a set (or class) Y is closed with respect to an operation F 
of degree r if we have F(Z1,...,Z,) € Y for all %,...,Z, © Y such that 
F(Z,...,Z,) is defined. For every X € V we let 7(X) denote the smallest set 
Y > X that is closed with respect to the operations F\,..., Fg. It will later be 
shown (Section 1.4) that 7(X) actually is a set. The following construction is 
analogous to the definition of V. 


1.2. Definition. 
Lo = 2; 
Lot = P (Le) a I Le U Lhe t 5 


i U Lg, if a is a limit ordinal; 
B<a 
L=UlLg. 


The elements of L are called constructible sets. 
Yu. I. Manin, A Course in Mathematical Logic for Mathematicians, Second Edition, 151 
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The operations F),..., Fg and simple combinations of them, together with 
the transfinite recursion in the definition of L, exhaust the arsenal of primitive 
set-theoretic constructions used in mathematics. This can be seen by looking 
at Bourbaki’s “compendium of the results of set theory,” upon which all subse- 
quent material in their voluminous treatise on the foundations of mathematics 
is based. The only way we could possibly (but not necessarily) leave L would be 
to apply the axiom of choice. This could happen provided that L is strictly less 
than V; but, as mentioned before, this question is undecidable in the Zermelo— 
Fraenkel axiom system (see also 5.16 below). Gédel was of the opinion that L 
does not exhaust V, as are most specialists who accept the semantics of L,Set. 

Of course, the constructibility of the elements of Z should not be understood 
in a finitistic sense. The sets we construct at the (a + 1)th stage are only 
the subsets of Ly, that are obtained from the elements of the sets Ly and 
{Zq} using the explicit constructions F;. But when we consider all the ordinals 
indexing the stages, we see that L is hopelessly infinite. Nevertheless, in many 
respects the construction of L is simpler than that of V, and L seems to provide 
a convenient framework for mathematics. 

We now list some properties of LZ that follow easily from the definitions. 
The specific nature of the operations F; plays a very secondary role in these 
properties. 


1.3. L, = V, for all n < wo. This is true for Lo. Suppose it is true for 
Ly. It is clear from the definition that L, € Dy, and {X} € Ly41 for all 
X € Ly. Moreover, any subset of L,, can be represented as a finite difference 
(«++ (Ln\{X1})\{X2})\---\{ Xe}, where the X; € Ly, are the elements not in 
the given subset. 


1.4. card Ly = card a@ for all infinite ordinals a. In fact, for X € V let 


where F(X) = {F(Y)|Y € X} is the image of F restricted to the elements 
of X. Then 7(X) = US, 6"(X). It is hence clear that card 7(X) = card X 
if X is infinite. We now prove the assertion 1.4 by induction on a. 

Obviously card Ly > card a. Suppose that a is the least infinite ordinal for 
which card Ly > card a. By 1.3, we have a > wo. a cannot be a limit ordinal, 
or we would have card La = Ugcq card 3 = card a. But the case a = +1 
is also impossible, since in that case card La < card J(Lg U{Lg}) = card 
(Lg U {Lg}) = card 6 = card a. 

In particular, the result 1.4 shows that beginning with wo + 1, the inclusion 
La C Va becomes a strict inequality, since card V,,,41 = 2°. Of course, this 
does not in principle exclude the possibility that Va 48 > a, Lg D Va, but it 
seems that there is no such ( even for a=wo +1. 
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1.5. L is transitive: Y Ee X € La > Y € La, t.e., La C La+1. See Section 13 


of the appendix to Chapter II; the proof is no different for L. 


1.6. L is a big class: by definition, this means that for any X € V with X C L 


there exists aY € L such that X CY. 


On L we consider the function ¢(#) that is equal to the least a for 
which x € Lg. Let X € V, X C L. We consider the map ¢ restricted to 
X. By the replacement axiom, the values of ¢ form some set Y. The elements 
of Y are ordinals. Let 6 =U Y. Then for each x € X we have 3 > (x), so that 


XC Lg. 


Effective numbering of L by ordinals. 


We order pairs of ordinals (a, () by the relation 


(a1, 1) < (a2, G2) = either max(a1, (1) < max(az, 2), 
or else these maxima are equal and a, < ao, 


or else these maxima are equal and a, = a2 


and (3, < fo. 


Further, we order triples (i,a, 3), where i = 0,...,8, by the relation 


(41, Q1, G1) < (i2, a2, G2) & either (a1, (1) < (a2, G2), 


We call these triples important. 


1.7. Lemma. The class of important triples is well-ordered by the relation <. 


or else (a1, 31) = (a2, G2) and 71 < to. 


In addition, the following assertions hold: 


(a) The next triple after (i, a, 3B) has the form 


Gi+1, 0, B), fi<7; 

(0, o41, By, ¢i=8 andar1<é 
(0, a+1, 0), fi=8anda+1=f; 
(0, a, B+1), ff i=8 anda> BZ; 
(0,0, 6+1), #fi=8anda=Q8. 


(b) Limit triples have the form 


(0, Qa, GB), 


if a+1< 6 and ais a limit ordinal: 


this is the limit of (i, y, B),y <a; 


if ais a limit ordinal: this is the limit of (i, 7, a),y < a3 


(0, a, 0), 

(0, a, 6), if a> and B is a limit ordinal: 
this is the limit of (i,a,7),y < B; 

(0, 0, 8), if B is a limit ordinal: this is the li 


y< fp. 


mit of (i, a, fee a<f, 
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ProoF. The proof follows immediately from the definitions. We shall illustrate 
this by showing explicitly how to find the least triple in any nonempty class C 
of triples. We set 


7 =min{max(a, )|(i, a, @) € Ch; 

Cy = {(i, a, 8) € C|max(a, 8) = 7}. 
If C does not contain any triples of the form (i, a, y), then let Go be the 
minimum of the third coordinates of triples in Cy, and let ig be the least 7 such 
that (i, y, 80) € Cy. Then (io, y, Go) is the least triple in C. Otherwise, let 
c, consist of triples of the form (i, a, y) € Cy, let ago be the minimum of the 


second coordinates in C, and let io be the least i such that (i, ao, y) € Cy. 
Then (to, @o, ¥) is the least triple in C. 


The exact form of assertions (a) and (b) will be needed only in §5. The 
lemma implies that there exists a unique order-preserving isomorphism 
K : {ordinals} = {important triples}. 
Using this isomorphism, we recursively define a numbering mapping 
N : {ordinals} => L. 


Since we havea < yand @ < vif y>0, i> 0, and K(y) = (i, a, 8), we 
may set 


Dass for 1 = 0; 
N(y) = § Fi(N(a@), N(6)), for i= 1,2,3; 
F;(N(a)), for i = 4,5,6,7,8. 


1.8. Lemma. 
(a) The mapping N is correctly defined. 
(b) The image of N coincides with all of L. 
PROOF. 

(a) To verify correctness, it suffices to show that {Z,} € L and that the 
class L is closed with respect to the operations F;. In fact, then induction on 7 
shows that N(y) € L if N(a) € L for all a < 4. 


Let X,Y € Lq. Since L is transitive (see 1.5), we easily find that F\(X,Y), 
F,(X,Y), and F(X) belong to P(Lq), and hence to L,41. For example, 


U € F(X) > IW, W) € Da > {US} € La SUE La. 


Further, X x Y is a subset of the ordered pairs of elements in L,. We showed 
that the unordered pairs lie in La+1, so that the ordered pairs lie in Za+2, and 
finally X x Y € Do+3 and F5(X) € La+4. Analogously, the elements of F(X) 
for i = 6,7,8 are ordered triples of elements in Da, so that F;(X) € La+e. 


(b) Let Z be the image of N. We show by induction on a that Dy C Z. If 
a is a limit ordinal and Ly C Z for each 7 < a then also Ly = Uycaly C Z. 
Suppose a = 8+ 1 and Lg C Z, and let X € Ly. Then X € ®"(LgU{Lg}) and 
we show that X € Z by induction on n. 
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(b1) n = 0. Then either X € Lg so X € Z by the induction hypothesis, or 
else X = Lg, in which case X = N(7) for y such that K(y) = (0, 6, 0). 


(bo) n > 0. Let X = F(Y,Z), i = 1,2,3; ¥,Z € B-1(Lg U {Lg}). By 
the induction hypothesis, Y = N(7) and Z = N(42) for some ordinals 41, y2. 
Therefore X = N(y), where K(y) = (%, 71,72). 

Let X = F,(Y), i = 4,...,8; Y € ®"-1(Lg U {Lg}). The verification is 
analogous. 

The lemma is proved. 


In §3 the numbering N will allow us to prove that a strong form of the 
axiom of choice is [-true. The fundamental step in the proof is to choose the 
element with the least N-number in each constructible set. 


2 Definability and Absoluteness 


2.1. Let M@ Cc V be a nonempty class, and let P be a formula in L)Set. As in 
§7 of Chapter II, we shall consider the truth values |P|,.(€) for € € M, where 
we take the standard interpetation of L;Set in V restricted to M. We then say 
that the formula P is M-true if |P|n;¢ = 1 for all €. 

We shall also consider formulas “with constants in WM,” where we assume 
that the language LSet has been extended so that its alphabet includes names 
for all the elements of !/. We shall designate these elements by the same letters 
as in the metalanguage (X,Y,... for sets; a, 3,... for ordinals, etc.), which we 
hope will not lead to confusion. We extend the definition of |P|,7(€) to formulas 
with constants in M in the obvious way: we take X$ = X for any constant X 
and any point €. 


2.2. Definition. Let X; © M, i=1,...,n. Sets of the form 


{(yf,---,y8)l€ € M, y§ € X; for i=1,...,n; |Plu(©) =1} 
CXyX+++ xX Xy 


are called M-definable sets. Here P runs through all formulas with constants 
in M and free variables in the set {yi,..-, Yn}. 


If P(y,---;Yn, Z1,---;Zm) is such a formula (where the notation shows the 
constants and free variables) and if ys = Yj, we shall often write “P(Y1,...,Yn, 
Z\,-.-,4m) is M-true” instead of |P|ac(€) = 1. 

The next proposition, which, in particular, is applicable to L, is a basic 
instrument for proving many assertions about L. 


2.3. Proposition. Let M C V be a transitive big class (see 1.6) that is closed 


with respect to the operations F,,..., Fs. Then all M-definable sets are elements 
of M. 


ProoF. The proof is by induction on the number of connectives and quantifiers 
in the defining formula P. 
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(a) P(y1,---;Yn} Z1,---;Z%m) is an atomic formula. It can have one of eight 
possible forms: the predicate can be either € or =, and on each side of € or 
= we can have either a constant or a variable. But all of these cases reduce to 
two: yi € yj and y; € Z;, if we are willing to make the formula a little more 
complicated. For example, since M is transitive, we have 


“y= Z” defines the same set as Vz(zE ZS zEy), 
“Z ey” defines the same set as dz(z = ZA z €y), 


and so on. We therefore analyze these two basic cases. 


(a1) yi € Z. We have ZN X; = Z\(Z\X;) € M, since Z and X; € M, and 
M is F5-closed; and we have X1 xX +--+ X X;_1 X ZN Xj x --: X Xn € M, since 
M is F3-closed. This last set is M-definable by the formula y; € Z, because 
is transitive. 


(a2) yi € yj. We use induction on n > 3. Let 
Y={(N%,..-,¥n)|¥r € X_ fork =1,...,n; ¥; © Yj}. 
The case (i,j) = (n—1,n). Let X,p-1 UX, C X € M. Then 


Y= 
x Fe(Fs(X) x (Xy x +++ & Xp—2) NM (Xn-1 X Xn) X (Xy x +++ & Xp—2)). 


The case (i,j) = (n,n —1). Again let X,-; UX, C X € M. Then 


Y= 
x Fy(Ps(X) x (XX +++ X Xn-2) ON (Xn-1 X Xn) x (Xd X +++ X Xp—2)). 


The case n ¢ {i,j}. By the induction assumption, the set Y , which is 
M-defined by the formula y; € y; in X1x---* Xp—1, lies in M. But Y =Y' x Xp. 
The casen—1 ¢ {i,j}. Let Y be M-defined by the formula y; € y; in 
Xy Xs ® Xpeg x Xp. Then Y = FAY x Xy-1). 
The case n = 2 reduces to the case n = 3 by taking the direct product with 
{@} and projecting. The projection of X, x --- x X, onto X1 is Fyo--+-+o Fy 
(n — 1 times). 


(b) Connectives. \ corresponds to intersection, and — corresponds to taking 
the complement (relative to X; x --- x X,). M is closed with respect to these 
operations, and the other connectives can be expressed in terms of these two. 


(c) Quantifiers. It suffices to verify 4. This corresponds to projecting, 
because M is a big class. More precisely, let Y be M-defined by the formula 
Fyn4iP(yi,--+5 Yn; Yn+1) in X1 x +++ xX Xp. We have 


(Yi,---; Yn) eYs 
there exists a Y,41 € M such that P(Y,,...,¥n41) is M-true. 
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To each (Yi,...,¥n) € X1 X +++ X Xp» we associate the least ordinal a for 
which there exists Yn41 € MOV, such that P(Y1,...,¥n41) is M-true, if there 
is such a Y,,41. This gives rise to a function on Y C X1 x---x X,. Let A be the 
set of its values, and let @ =U A. Then X = MNVg isa set, and X C M. Since 
M is a big class, there exists X,+41 € M such that X C Xy41. By the induction 
assumption, the M-definable subset ¥ic X, X+++ X Xy X Xn+41 consisting of 
those points (Y1,...,¥n41) for which P(Y1,...,¥n41) is M-true belongs to M. 
But Y = F,(Y’), and M is closed under F4. 

The proposition is proved. 


In order to be able to use Proposition 2.3, we need criteria for verifying 
M-truth. As remarked in §7 of Chapter II, the basic technical tool for this is 
the notion of absoluteness. A formula P is called M-absolute ((1/, V)-absolute 
in the terminology of Chapter I) if |P|w(€) = |Plv(€) for al€ Ee M CY. 
The standard method of proving that a formula is M-true is to prove that it is 
V-true and M-absolute. 

The following lemma provides us with a large class of M/-absolute formulas. 


2.4. Lemma. 


(a) Atomic formulas are M-absolute for all M. 

(b) If the formulas P, P,, and Py are M-absolute, then so are the formulas ~P 
and P; * Py (where * is any connective). 

(c) Suppose that the class M is transitive, and is closed with respect to an 
operation f of degree r. If the formula P is M-absolute, then the “restricted 
quantifier” formulas 


Va(a € f(yi,---,Yr) > P), 
da(a € f(y1,---,Yr) AP) 


are also M-absolute. 


PRooF. Part (c) is the only assertion that might not be completely obvious. 
Before proving it, we make one remark. The formula x € f(yi,...,y,) is writ- 
ten in a suitable extension of L;Set, and may be assumed to be V-equivalent 
to some formula P(x,y1,...,Yyr) in LySet (with constants in M) for which 
Vyi,.--,Vyr dla P or a restricted version of this formula is deducible from the 
Zermelo—Fraenkel axioms. This P determines the operation f. We also allow 
the case r = 0; then f is simply a constant in M. We shall identify f with its 
standard interpretation, i.e., we shall denote terms by f(Yi,...,Y;) € M for 
Yi,...,¥,EM. 

Now let € € M, y = Y, € M, Q@ = Aa(a € fl(m,.--.,yr) A P), 
Y =f(%,...,Y,) € M. Then 


|Qlu(€) = sup ((X €Y|u-|Plu(€)), 
XEM 


where the € € M are variations of € along x such that af = X. Since P 
is absolute, it follows that |Pla¢(€) = |Plv(€), and since M is transitive, it 
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follows that if X ¢ M, then |X € Y|w =|X € Y|y =0. Hence, on the right we 
can write V everywhere in place of M and can let € "run through all variations 
of € along x in V with § = X. The resulting expression equals |Q|y(€). 

The quantifier V can be handled analogously, or else can be reduced to J. 
The lemma is proved. 


We shall abbreviate the restricted quantifier formulas in 2.4(c) as 


(Va € f(yi,---.Yr))P, (Ar € f(yi,---,¥r))P, 


respectively. 

If all the quantifiers in a formula Q are restricted in this way, we say that 
Q is a o-formula. 

As a first application of the results in 2.3 and 2.4, we prove the following fact. 


2.5. Proposition. All ordinals are constructible. 


PROOF. Suppose that this is not the case, and that ( is the least noncon- 
structible ordinal. All of the elements in @ are contained in Ly. Since L is 
transitive, it follows that all y > @ are nonconstructible. Hence, 


3 = {x|(a is an ordinal A a € LD.) is V-true}. 


If we show that “V-true” may be replaced by “LZ-true” here, we immediately 
have a contradiction, since then @ € L by Proposition 2.3. 

To do this, it suffices to verify that the formula “x is an ordinal” is 
L-absolute. Using the regularity axiom, from which —(y € y) is deducible, we 
can write this formula in the following Uo-form: 


(Vy € a)(Vz Ey/(zear)A (Vy € @)(Vy2 € x) (yr © yo V yo © yr V 1 = Ye) 


and then apply Lemma 2.4. 


3 The Constructible Universe as a Model for Set Theory 


3.1. Theorem. The Zermelo—Fraenkel axioms are L-true. 


Proor. The general principle for verifying the axioms is to note that every 
set whose existence is stipulated in a given axiom can be represented as a set 
defined by a No-formula with constants in L. We only occasionally have to 
perform a direct verification that a subformula is L-absolute. 


(a) Empty set. This axiom is equivalent to the Uo-formula =da(x € 9), 
which is V-true. 

(b) Extensionality. This axiom can be represented in “o-form. In addition, 
in Section 4.8 of Chapter II we verified this axiom for any transitive class. 
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(c) Pairing. A direct computation of the L-truth function gives 1, since LD 
is closed with respect to forming pairs. 

(d) Regularity. This follows by a direct computation using the transitivity 
of L. 

(e) Union. Here it is somewhat more complicated to reduce the axiom to a 
Xo-formula. The axiom is written in the form 


Va dy Vu(sz(ue zAze€ xr) Suey). 


Let € € E, let € be any variation of € along x, and let X = af € L. We must 
show that 


|Ay Vu(dae(ue zAze€ X) Suey)|r(€) =1. 
It suffices to find a Y € L such that 


Wu(dz(uezAzeExX)eSueY)|z=1, 
i.e., such that for all U € L, 


We can clearly take Y = LU) zexZ if we show that Y is constructible. Since L is 
transitive, we know that all the elements of Y are constructible. Hence, there 
exists a constructible set Y such that Y > Y. Then Y can be represented as 
follows (where we replace V-truth by L-truth using Lemma 2.4): 


— 


dzex)(U €z)|\,=|U EY |x. 


Y={U|JUEY; (daz € X)(U € 2) is L-true}. 


Now the required assertion follows by Proposition 2.3. 
In what follows we shall usually omit explicit mention of the points € € L. 
(f) Power set axiom Va dy Vz(z C 4 & z € y). We fix X © L, form the set 
Y = P(X)NL of constructible subsets of X, and show that Y is constructible. 
In fact, let Y > Y, where Y’ is constructible. Then by 
Lemma 2.4, 


Y ={Z|\ZEY; (ZC X) is L-true}, 
because Z C X has the No-form (Vz € Z)(z € X). Now a direct computation 
gives 
Ve(izC X@ezeY)|p=l. 
(g) Infinity. This axiom is L-true because of the constructibility of the set 
{@,{@}, {{@}},...}, which can be represented in the form 


{VIV € Lugs [Y = OV (Ay € Lun) (¥ = {y})] is L-true}. 


(h) Replacement. Let Z = (z1,...,2n). This axiom is written in the form 


=> dw Vy(y € w & Aa(a eur P(«,y,2)))). 
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We fix Z1,...,Z) € L,Z=(Z,,..., Zn), and U € L. It is sufficient to consider 
the case that the premise is [-true, i-e., for all X € L, 


|X €U => AlyP(X,y, Z)|z = 1. 


We must find a value W € L of w for which the conclusion is L-true. We set 
W = aconstructive set containing as elements all constructive Y for which 


(dar € U)P(z,Y, Z) is L-true. 


This set exists because since the premise of the axiom is L-true, it follows that 
each X € U corresponds to at most one constructible Y. We then set 


W = {ylv eW': (ar €U) (POY, Z)) is L-true} 


This set is constructible by Proposition 2.3, and it follows from the way it is 
defined that 


vu (y EW Sar(rxEUn P(x,y,Z))) . = 1. 
(i) Axiom of choice. The main intuitive point in the verification is the numbering 
N of the universe L that was constructed in 1.8. But the formal verification 
is much more complicated here than in the previous cases. A fair amount of 
work is needed to give a formalization of the construction in 1.7-1.8 that is 
sufficiently detailed to prove the following fact: 


3.2. Proposition. There exists a formula N(x,y) in L with two free variables 
such that 


(a) For any X,Y € V, the formula N(X,Y) is V-true if and only if X is an 
ordinal and Y = N(X). 
(b) N(a,y) is L-absolute. 


We shall postpone the proof until §5, and shall make use of this proposition 
to verify the axiom of choice. We divide this verification into two steps. 


3.3. UNIVERSAL CHOICE FUNCTION. Let X € LD be anonempty set. We construct 
the function Y that for every nonempty Z € X chooses the element U in Z 
with the least N-number (see 1.8): 


¥ = {(Z,U)|IZEX, ve |) x: U € ZA3w(N(w,U) Ave(z€ Z 
X'EX 


=(z=U Vv Ww (N(w,z) Swe w')))) is V-true}. 


We want to prove that Y € L. By Proposition 2.3, this holds if we can 
define Y by means of the LZ-truth of a formula. We are not allowed mechani- 
cally to replace V by ZL, since it is not immediately obvious from its external 
form that this formula is L-absolute. We proceed as follows: taking into account 
the constructibility of the ordinals, we take all ordinals that occur as the least 
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N-numbers of the elements of the constructible set Uxy/e.X ‘= U(X), and we 
find a constructible set W that contains these ordinals. Then we replace dw by 
Jw € W and Vw by Vw’ € W in the formula. The set Y does not change, and 
now V-truth may be replaced by L-truth, as can be seen using Proposition 3.2 
and Lemma 2.4. 

3.4. We now compute the L-truth value of the axiom of choice: 


Va(a # @ => Jy(y is a function A dom y = x 
A (Wz €a)\(z SO => y(z) € 2))). 


It suffices to show that if we take a nonempty X € L and the constructible 
choice function Y € L in 3.3, then 


lY is a function|; = |dom Y = X|p, = |(Vz €E X)\(z@ AOS YV(z)€ z)[,=l. 


The third formula here is V-true, and is written in Ng-form except for the sub- 
formula Y(z) € z, which can be replaced by (Vu € U(Y))((z,u) © Y > ue z). 
Thus, the third formula is L-absolute and hence L-true. 

We verify that the first two formulas are absolute in $5. They are V-true by 
construction. This completes the proof of Proposition 3.1. 


We note that the same argument shows the following: all the axioms, with 
the possible exception of the axiom of choice, are M-true for any transitive big 
class M that is closed with respect to the operations F,,..., Fs. 


4 The Generalized Continuum Hypothesis Is L-True 


4.1. We wish to show that the assertion “card P(wa) = wa+41” is L-true. 
A certain amount of caution is essential here, because cardinality is not an 
L-absolute notion. If Y is a constructible set, let card, (Y) be the least ordinal 
G for which there exists in DL a one-to-one onto function f : Y — (. Hence 
“card (Y) = card (Z)” is L-true iff cardz(Y) = cardz(Z). Note that although 
cardz(Y) > card (Y), equality fails if there are one-to-one onto functions Y — ( 
in V, but no such function lies in L. The cardinal w, in L is the ath ordinal 
3 > wo such that card, (3) = G. Thus wa, in L may not coincide with the “real” 
Wa, that is, with we in V. 

We shall show that for each ordinal 3 and each constructible X C ( there 
is an ordinal y with X € L, and cardz(y) = cardz(). Hence P(B) NL C Lg, 
where * is the least ordinal greater than 3 such that card;(*) 4 cardz({). 
The L-truth of the generalized continuum hypothesis will then follow if we show 
the L-truth of “card (G+) = BT.” 

Our proof exploits throughout a proposition that requires a good deal of 
work formalizing the construction of DL within L;Set. 


4.2. Proposition. There exists a formula L(x, y) of LiSet with two independent 
variables 
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such that 


(a) for any X and Y in V, L(X,Y) is V-true = Y is an ordinal and 
XEL,; 

(b) for any transitive model M C V of the axioms (without the axiom of choice), 
the formula L(x,y) is M-absolute. In particular, it is L-absolute. 


We again postpone the proof until §5. 


4.3. Lemma. Let X C £§ be constructible. Then X € L.~ for some ordinal 
such that card ,(y) = cardz({). 


ProoF. In this deduction, in addition to Proposition 4.2 we use versions of 
Propositions 7.3 and 7.6 of Chapter II that apply to the constructible universe. 
They are formulated precisely and proved below, in Sections 4.5 and 4.6. 

Suppose that X C ( is constructible. Let 6 be an ordinal such that X € Ls. 
We enlarge the alphabet of L,Set by adding names 5 and X for 6 and X. Let 
E be the set of formulas 


{axioms of L;Set} U {L(X, 6)}. 


Let No C L be the set 6 U {X}1N {6}. By Proposition 4.5 there is a 
constructible set N such that No C N, all formulas in € are (N, L)-absolute, 
and cardz(N) = cardz(@). Thus (N,€) is a model for the axioms and, by 
Proposition 4.2 (a), for L(X,6). Now N might not be transitive, but then by 
Proposition 4.6 there are a transitive axiom model (M,<) and a constructible 
isomorphism f : (N,€) — (M,¢). Hence L(X,6) is M-true and card;(M) = 
cardz(N). What are the interpretations of the constants X and 6 in M? 

Since the set @ C N is transitive, it goes to itself under the isomorphism 
f; hence so does the set X C (. Let daz be the image of 6 under f. Since by 
Proposition 4.2(b) the formula L(x, y) is M-absolute, and L(X, 64) is M-true, it 
follows that L(X, daz) is V-true, so that dj, is an actual ordinal and X € Ly,,. 
Moreover, since dy, € M and M is transitive, dy, C M; hence cardz(dm) < 
card;(M). Letting y be the larger of dj¢ and (3, we have cardz(y) = cardz(() 
and X € L,. The lemma is proved. 


4.4, DEDUCTION THAT THE GCH Is L-TRUE FROM THE LEMMA. Let 3+ be the 
smallest ordinal greater than 3 such that card;,(G*) 4 card, (3). Then Lemma 
4.3 implies the V-truth of the formula 


Va(z2eL=>(z2cp=> ze Lgs)). 
Since “z € Lg+” (i-e., the formula L(z, 3*)) is L-absolute, it follows that 
Ve(z C 8 > z € Let) 


is L-true. Now if (@ is the cardinal wy in L then #7 is the cardinal wa, in L. 
Hence for each a@ we have shown the L-truth of 


PWe)C Lunia 
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We claim that the following formula is also L-true: 


card(Lis,.,) = Wat. 
Since “card(Pwa)) < Wa+1” is formally deducible in LySet from the preceding 
two formulas, and since all the axioms are L-true, this will show that the GCH 
is L-true. 

Our claim is verified thus: In Section 1.4 we proved that card(L-) = card(7) 
for each ordinal y. Indeed, that proof can be formalized in LiSet, using the 
formula L(x,y) of Proposition 4.2. That is, the assertion “Vy(card(L,) = 
card(y))” is deducible from the axioms (see 5.17). Since the axioms are L- 
true, this assertion is then L-true. But since “card(wa+41) = Wa+1” is trivially 
L-true, the claim follows. This completes the proof. 


4.5. Proposition. Let E be a constructible countable set of L-true formulas 
in the language LiSet, and let Mo be a constructible set. Then there exists a 
constructible set M > Mo,cardz(M) < cardz(Mo) + wo, such that all of the 
formulas in E are (M, L)-absolute. 


PROOF. The general scheme is the same as in Section 7.3 of Chapter II, but 
some additional precautions are required. The main point is to prove that if 
P(x,¥),9 = (y1,---,Yn), is a formula in €, then there exists a constructible 
set M > Mo with cardz(M) < cardz(Mo) + wo that can be constructed con- 
structibly from P and has the property that dr(P(ax,7)) is (M, L)-absolute. 
After this we must verify constructible closure over all P € €. 

We reproduce the construction in Section 7.3 of Chapter II. We construct 
the set M; by induction. Let Y = (Yi,...,¥n) € Mj x --- x Mj. We let M;(Y) 
denote the class {X|P(X,Yi,...,Yn) is L-true}. We let M;(Y) denote @ if 
M,(Y) is empty, and M;(Y)/ Lg for the least a for which this intersection is 
nonempty otherwise. Since D(a, y) is absolute (see §5), it is not hard to see that 
the function M;, dom M; = M; x-:: x M,, is constructible. Because the con- 
structible axiom of choice holds in L, we can obtain a constructible function F; 
by choosing one element from each nonempty M;(Y). Let N; be the set of values 
of M;. This set is constructible, since all of our constructions are absolute; and 
if M; is infinite, then card,(N;) = card, (M;). We set Mj41 = M; UN; and 
M =U M;. The set M has the required properties; obviously, card; (MW) + wo = 
card; (Mo) + wo in L. The formal transition from {M;} to M is realized by 
considering a function that “closes” Mo, as in Section 5.11 below. 


4.6. Proposition. For every constructible set N such that the extensionality 
axiom is N-true there exist a unique constructible transitive set M and isomor- 
phism f :(N,€) > (M,e). 


PrOooF. The plan of proof is the same as in Section 7.6 of Chapter II. First 
let “f is a continuous (@ + 1)-sequence” be the formula “a is an ordinal” \“f 
is a function”A domf = a+1A (V6 € a+1)(6 a limit ordinal > f(@) = 
U-+esf(y)). This formula is shown to be L-absolute as in Section 5.14 below. 
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Now consider the L-absolute operation ¢(Z) = {X|X © NA XN Cc Z}, and 
let Sy be the unique member of N such that By 1 N = @. Finally, let (2, y) 
be the formula 


— 
| 


Af)(“f is a continuous (a% + 1)-sequence” A f(0) = nA 
x (V8 € x)(f(G +1) = O(f(A))) Ay = F(a). 


Then w is L-absolute, as can be shown as in Sections 5.14 and 5.15 below, and 
v(a,y) is L-true if and only if y = N, in the sense of Chapter II, Section 7.6. 

We now set N = UgNa = {z|(Sa)(Sy C N)(W(a,y) Az € y)}. We show that 
N =N. Clearly N CN, and if N \N = Y were nonempty, it would follow by 
the regularity axiom, which holds in L, that JZ(Z © Y A ZN Y = @). For this 
Z we would have Z C N, hence Z C N, for a suitable a, so that Z € No+1, 
which is a contradiction. 

The implication Z C N > Ja(Z C N.), which we have used here, follows 
because there exists an absolute function on N that associates to each X the 
least a for which X € N,. The replacement axiom shows that there exists 
an ordinal ap, namely, the least upper bound of the values of this function, 
for which N = N = No,- This ordinal, which is fixed for N, occurs in our 
subsequent construction, which is verified to be absolute as in §5. 

Let “h is a constructing (a + 1)-sequence for N,M” be the formula “h is 
a continuous (a + 1)-sequence” A h(0) = {(On, S)} A “(VEG EC a)(h(B 41) isa 
function A dom h(G +1) = Ng4iA the value of h(G+1)) on any X € Ng41 
is the set of h(3)-images of elements of X 9 N).” Then for each a there is a 
unique such h; let M, be the image of h(a). For a = ag we obtain a function 
h: N — M = M,,, where M is our desired constructible set and h is a 
constructible €-isomorphism. 

The proposition is proved. 


— 


5 Constructibility Formula 


5.1. The purpose of this section is to prove Propositions 4.2 and 3.2. Both proofs 
are extremely straightforward, and simply consist in writing out explicitly the 
formulas L(x,y) and N(x,y) and verifying that the conditions in Lemma 2.4 
apply. But since these formulas are very long, we perform the verifications in 
a series of “blocks,” in order to improve their appearance and to make the 
interpretation and verification of the conditions in 2.4 easier. As soon as a 
block (subformula) is constructed and its absoluteness is verified, we replace it 
by an abbreviated notation in the next formula. 

The material within each subsection is arranged in the following order: 
first the abbreviated notation for the formula that is being constructed and 
shown to be absolute in the subsection; then the complete form of the for- 
mula; and finally any remarks that may be needed regarding absoluteness. 
The “complete form” of the formula may contain abbreviated notation for 
subformulas. If such a subformula has not yet been interpreted in detail and 
shown to be absolute, this is done right after the complete form. 
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By absoluteness we mean “//-absoluteness for any transitive model M for 
the axioms without the axiom of choice.” 

Sections 5.2-5.15 are devoted to the formula L(x,y), and Sections 5.18- 
5.20 are devoted to the formula N(a,y). As the material we are dealing with 
accumulates, we shall allow ourselves to omit more and more details and to rely 
on the reader’s experience. 

The formulas 


Fi(x,y), t= 1,2,3; 
AA 
Fy(y), j= 4,5,6, 7,8. 


5.2.2={a,y}: Vue z)u=rxVu=y)ArE zAy Ez. This whole formula 
is clearly absolute by Lemma 2.4. From now on we shall not even comment on 
such simple cases. 


5.3.2=a\y: (Vuez)(uEerAugy)AWucz)(ugys>uez). 


5.4.2=axy: (Vu € )(Vue € y)((u1, U2) € 2) 


A (Wu € z)(dui € x)(Sue € y)(u = (ur, u2)); 
du € z)(v = (ui, u2)); 
u= (ui,uU2): (Vv €u)(u= {ur} Vv = {u1, ug}) 
A {ui} Gud {ur, ua} € u; 
{ui,u2}eu: (dav € u)(u = {u1, ug}). 


— 


(ui, U2) € 2: 


5.5. Z = Fy(y) =dom y: (Vu € z)(du € UU (y))((u, v) € y) 
A(Wu € UU(y))(Vu € UU (y))((u, v) € y > WE 2). 
Here U U appears because (u,v) = {{u},{u,v}} © y > uv © UU (y). 


This formula is absolute, since a transitive model is closed with respect to the 
operation U (see 3.1(e)). We shall write U? = UU, and so on. 


5.6. 2 = Fs(y): (Vu € z)(au € y)(Aw € y)(u € wAu = (v,w)) A (Wu € y) 
(Vw € y)(u € w => (v,w) € 2). 


5.7. 2 = Fe(y): (Vu € z)(duz € U4*(y)) (Jue € U*(y))(Gug € U?(y))((ur, ue, uz) € 
yA u= (ug, U1, u2)) A (Vur € U*(y))(Vue € U*(y))(Wus € U?(y))((ur, u2, us) € 
y => (ugz,u1,u2) € 2). Here U* appears for the same reason as U? in 5.5. 
The formulas (ui, u2,u3) € y, etc., are shown to be absolute in the same way 
as in 5.4. 

The operations F7 and Fg are treated analogously to Fg. 

The formulas 


— 


uy 


F, (a), for 7 = 4,5,6,7,8. 


ee xx), fori =1,2,3; 
Y= 
j 
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uw 


5.8.y=F, («x ax), i=1,2,3: 


(Vu € y)(dui € x)(Aug € x)(u = F;(u1, u2)) 
A(Wur € #)(Vu2 € #)(Fi(u1, U2) € y), 


where F;(ui,u2) € y: (du € y)(v = F;(u1, ua)). 


5.9. y = Fj (x),j = 4,...,8: (Vu € y)(du € z)(u = F; (v)) A (Wu € 2) 


(Fj (v) €y). 
5.10. y = ®(x) (see 1.4): 


(Vz €y)(ze€xVze F (x@x2)V--Vze Fy (x)) A (Wz € x)(z € y) 
A (Wz € Fy (a x a))(z €y)A--- AWE Fy y(x))(z € y). 


The class L is closed with respect to the operations i. In fact, suppose, 
for example, that i > 4, and let X € L. Let U € L be a set containing all F;(Y) 
for Y € X. Then 


uw 


Fi (X) ={Z|Z€U, (Ay € X)(Z = Fi(y)) is V-true}. 


Since the formula Z = F;(y) has been shown to be absolute, we may replace 
“V-true” by “L-true” here, and then apply Proposition 2.3. Thus, the formula 
y = ®(x) is L-absolute by Lemma 2.4. 

If M is an arbitrary transitive model, then the verification that M is closed 
with respect to F, is somewhat different. Namely, the formula Va J!y(y = 
F" (x)) is obviously V-true. The formal deduction of this formula does not use 
the axiom of choice. Hence, the formula is M-true for any transitive model M. 
We therefore have Y € M if X € M, where Y = F, (X). We shall use this 
device many times in what follows. 


5.11. “g closes x,” which is short for “g is a function on wo, and g(n) = ®" (x) 
for alln € wo.” We write the formula with the constant wo and the free variables 
g and z: 


“g is a function” A Fy(g) = wo A g(0) = a 


A (Wn € wo)(g(n + 1) = ®(g(n))). 
Here: 


(a) “g is a function”: 


(Vu € g)(Sur € U*(g)) (Suz € U7(g))(u = (ur, u2)) 
A (Vur € U?(g))(Vu2 € U7(g))(Vus € U7(g)) 
((u1, U2) € g A (u1, U3) © g > U2 = Ug). 


(Ay € U7(g))((n,y) € 9 A (nU {n}, Bly) € 9), 
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where 


(nU{n}, ®(y)) Eg: (Jue U*(g))(Jv € U*(g)) 
(u=nU{n} Av = ®(y) A (u,v) € g). 


Since wo € M, the formula 5.11 is now easily seen to be absolute by the previous 
results. 

In 5.11 we took the liberty of using g and n for variables of L;Set in order to 
make the formulas intuitively clearer. In what follows we shall also use a, 3, K 
and N as variables, thereby temporarily ignoring our convention of using only 
lowercase letters at the end of the Latin alphabet. 


5.12.ye Ju: Ag(“g closes x” \(An € wo)((n, y) € g)). Here the quantifier over 
g is not restricted. Since the formula under the dg sign is absolute, we may 
conclude directly from the definition ||,(€) = ||v(€),€ € M, that y € Jz is 
also absolute, provided we show that for any X € M, the function G € V that 
closes X lies in M. The formula Vx 4! g (“g closes x”) is obviously V-true. If 
we formalize the verification of this fact, we see that this formula is deducible 
from the axioms without the axiom of choice. Hence it is M-true. This implies 
that for any X € M we haveGe M. 


5.138.yEP(a4)N IJ (c#U {a}): Wee y)(VWUE z)vEr)AyE J(xU {z}). 


5.14. “f is the constructing (a + 1)-sequence,” which is short for “a is an 
ordinal” “f is a function”A dom f=a+1A (V8 e€a+t+1)(f(8) = Lg). 
Here: 


(a) (V8 € a+ 1)(f(B) = Lg): 
(VB € a+1)((G is a limit ordinal > f(8) = Uyesf(y)) 
A(F(B +1) = P(F(B)) 0 F(F(B) U {F(2)})))- 

(b) “Gis a limit ordinal’: “@ is an ordinal” \(Va € 3)(G 4 aU {a}). 
(c) f(B) =Ureaf(y): Gu € L°(f))(u = Urea f(y) A (8,2) € f); 

v=Ureaf(y): (Wu e v)(A7 € B)(u € f(y) 

A (Vue U(f))(ue f(y) > ue v); 
uef(y): (QweUr(f))(y,w) € fAuew). 


(d) f(B+1) = P(F(B)) 9 F(F(B) VU {F(A} 


): 
Fu € U(f))\((B8+1,u) € fA (Vue u) 
(v € P(F(B)) NO F(F(B) UV {F(8)})) 
AVo(u € P(F(B)) N F(F(B) ULF(B)}) = v € u)); 
v € P(f(B))N F(F(8) V{F(B)}): 
du € U?2(f))((6,u) € fAvEP(u) AT (uU {u})). 


—~ 


os 


oN 
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Finally, in order to verify directly that the subformula 


Volu € P(F(B)) O F(F(B) U{F(8)}) = v € u) 


is M-absolute, it suffices to show that M is closed with respect to the opera- 
tion X  P(X)N J(X U{X}). But M is closed with respect to both 7 and 
X + P(X) M, so the verification is complete. 


5.15. L(z,y): “y is an ordinal and x € L,”: “y is an ordinal” Adf(“f is the 
constructing (y + 1)-sequence” \(4z € U?(f))((y,z) € f Ax € z)). Since the 
quantifier 4f is not bounded, in order to verify this last absoluteness statement 
we must show that the constructing (Y + 1)-sequence F' is an element of M for 
any ordinal Y in M. We use the same argument as in 5.12: the formula Vy(y is 
an ordinal = 3! f(f is the constructing (y + 1)-sequence)) not only is V-true, 
but also is deducible from the axioms without the axiom of choice; therefore it 
is M-true. 

This completes the proof of Proposition 4.2. 


5.16. Remark. The formula Va dy L(x, y) is often written in the form V = L, 
and is called the axiom of constructibility. The absoluteness of L(x, y) implies 
that the following formula is [-true: 


Va ay L = inf L(X,Y)|_ = inf UX yes. 
[Va Sy La, y)|z ae ee PS Pe ey ae 


Hence, this formula is consistent with the Zermelo—Fraenkel axioms. On the 
other hand, V = L implies the generalized continuum hypothesis (GCH), and 
since the negation of the GCH is also consistent with the Zermelo—Fraenkel 
axioms, it follows that =(V = L) is consistent with the axioms. 

We now proceed to the proof of Proposition 3.2. This proof follows the 
same plan as the proof of Proposition 4.2. We return to the conventions and 
constructions in 1.7-1.8. 


5.17. Remark. In Section 4.4 we exploited the fact that the assertion “a > 
wo =card(L_) = card(a)” is formally deducible from the axioms of L;Set (with- 
out the axiom of choice). We may now see that such a formal 
deduction can be obtained by exactly mimicking the proof in Section 1.4. 
Indeed, from the definition of L(x, y) we have the formal deducibility of “La41 = 
P(La) OI (Lo U {La})” and “G a limit ordinal = Lg = UyegL,”. Moreover, 
the following are deducible: “card(X) < wo = card(X) < card(P(X)) < wo” 
and “card(X) > wo => card(J(X)) = card(X).” As a result, the assertions 
“card(Lw,) = wo,” “card(La) > wo => card(La+i) = card(L,),” and “6 a limit 
ordinal > card(Zg) = card(UyegL,)” are all deducible. And from these and 
the axioms of L;Set the desired assertion may be deduced (using, in particu- 
lar, the deducibility of “card(wo) = wo,” “a > wo =card(a +1) = card(a),” 
“Bis a limit ordinal > 8 = Uyegy,” and in addition an instance of transfinite 
induction on the ordinals, which is of course also formally deducible in L,Set). 
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5.18. The formula H(K,x): K is a function A x is an ordinal A dom K = 
w+1AK(0) = (0,0,0)A(Vy € x+1)(K(y) is an important triple \K (y+1) is the 
next important triple after K(y)) A (y is a limit ordinal > K(y) = limze, K(z)) 
is absolute. 

We shall not analyze the subformulas that have been considered before. The 
following subformulas remain: 


(a) “A(y) is an important triple AK (y+ 1) is the next important triple after 
K(y)”; 
(b) K(y) = limzey, K(z). 


We shall have to use the absoluteness of the auxiliary formula “y = x(;),” 
which is short for “x is an important triple (it.) and y is the ith coordinate of 
x,” where i = 1,2, or 3. That is; 


(Juz € U3(x)) (Sug € U3(x)) (Suz € U(z)) 
x (@ = (ui, U2, Ug) A uz is an ordinal A uy < 8 


A ug is an ordinal A uz is an ordinal A y = u;). 


The complete form of (a) is 


— 


qu € U(K))(Au € UB))((y,u) € KA (y+1,v) € K 
A wis anit. A v is the i.t. after u). 


. 


According to Lemma 1.7(a), “uw is anit. Av is the i.t. after u” can be written in 


the form vo C;(u, v), where C;(u, v) is the formalization of the ith alternative 
in 1.7(a). For example, 
Cy: uisanit.Avis anit.\ ug) <7Avqa) =uay) +l 

Bey thay NU) hah 
Co: wisanit.Av is anit. Aug) =8AUua) +1 < ug) 

Ava) = OA v2) = U2) + 1A v3) = 0. 
The other C; are analogous, and are absolute for the same reasons. 


The complete form of (b). Here we need to know that the following auxiliary 
formulas are absolute: 


u= U K(z)@, t=2o0r3: (Wu eu)(az €y)(u = K(z)w) 
A (Vz € y)(Av € u)(v = K(z)@); 
v=K(z)@: (dw €U(K))((z,w) € K Aw is anit. Av = wy). 


Then, using Lemma 1.7(b), we explain the formula K(y) = limze, K(z) as 
follows: 


K(y)a) = 


4 
0A dug dus (. = U K(z)2) A ug = U K(z) 3) A VV Dy(uata)). 
i=l 


zey zey 
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where the alternatives D; have the following structure, depending on “how K(z) 
approaches K(y)”: 


Dy: U2 € ug A uz is a limit ordinal A ((4z € y)(K(z)(3) = us) 
— k(y)(2) = u2 A K(y) 3) = us); 
Dz: U2 = uz A Uz is a limit ordinal A (dz € y)(K(z)(3) = us) 
A (Vz € y)(K(z) 2) € U2) > k(y)a) = v2 A K(y) a) = 9); 
D3: uz > ug A uz is a limit ordinal A ((Vz € y)(K(z)(3) € us) 
— K(y)(2) = u2 € K(y)(3) = us); 
Da: U2 = ug A Ug is a limit ordinal A ((Vz € y)(K(z)(2) € ua 
\ K(z)(3) € u3) > K(y)(2) = OA K(y)(s) = ua). 


It is therefore obvious that the D; are absolute. Even though the quantifiers 
Juz and Juz are not restricted, there is no problem, since when K§,y§ € L, this 


formula can be V-true only if us and us are uniquely determined ordinals and 
lie in LZ, which gives us L-truth. 


5.19. The formula S(.N,x): “a is an ordinal A N is a function \ dom N = 
a+ 1A(Vy < «+1)(N(y) is a constructible set with N-number y)” is absolute. 
We shall need to know that the following auxiliary formula is absolute: 


y=(x)i, t=1,2,3, where K(x) = ((«)1, (x2), (a)3) 


(not to be confused with the formula y = a) in 5.16, which occurs here as a 
subformula): x is an ordinal A3K (A(K, x) (au € U(K))((a, u) € KAy = uqy))- 
Even though 4K is not restricted, this does not cause any problem, because for 
every ordinal «§ € L, the value of K§ making H(K§,2x5) V-true lies in L. In 
fact, the V-true formula 


iS 


Va(x is an ordinal > I! (H(K,2))) 


is deducible from the axioms without the axiom of choice, and hence is [-true. 

We now return to S(N, a). We need only show that the subformula “N(y) is 
a constructible set with N-number y” is absolute. By definition, this subformula 
can be written as eae Qi(y, N), where the alternatives have the form 


Qo: (y)1 =0A (y, Ly.) EN; 
(y)1 =A (y, Fi(N((y)2), N((y)3))) © N; 
(y)1 =i (y, Fi(N((y)2))) € N- 


The absoluteness of the subformulas that have not been analyzed is clear 
from the following complete forms of these formulas: 


(a) (y Leo) EN: (Az €U(N))((y,z) EN Az € Leys); 
z=Ly,: (Gueyt+Du=(y)2hz= Lu); 
z=Ly: (We z)(v € Ly) AVu(u € Ly > v € 2). 
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We can verify directly that the last subformula, with the unrestricted quantifier 
Vu, is absolute, since Ly € L for any ordinal U, and L is transitive. 
(b) (y, FA(N((y)2))) EN, i= 4,...,8: 
(du, v,w € ULN))(u= (y)oA os wy ENAw= F,(v) A (y,w) € N). 
(y, Fi(N((y)2),N((y)3))) €N, = 1,2,8: 
(dua, uz, v2, 3,w € U (N))(u2 = (y)2 Aus = (y)3 A (ua, v2) E N 
A(ug,u3) € N Aw = F;(u2, v3) A (y,w) € N). 
5.20. The formula N(a,y): “x is an ordinal Ay = N(«)” is absolute. 
In fact, this formula is written in the form 


N(S(N,2+ 1) A (2, y) € N). 


(c) 


There is no problem with 4N being unrestricted, since we can apply the same 
type of ar gument as we have used many times before: for any ordinal z§ there is 
a unique N € making this formula V-true, and then N . € L, since the formula 
Va (a is an ordinal > 4!N(S(N,« + 1))) is deducible from the axioms without 
the axiom of choice, and hence is L-true. 

This completes the proof of Proposition 3.2. 


6 Remarks on Formalization 


Gédel’s theory, to which this chapter is devoted, is usually presented in a more 
syntactic version. We shall now briefly describe the system of basic ideas and 
the most important changes in the proofs in this version, in which the least 
possible appeal is made to the semantics. 


6.1. Let Q(x) be a formula in L,Set with one free variable x. Let ZF be the set 
of all the (logical, special, and equality) axioms of L;Set except for the axiom 
of choice. Q(x) is said to be transitive if 


ZF F (Q(x) Ay € x) > Q(y). 


6.2. The relativization Po of a formula P in L,Set relative to @ is defined by 
induction on the number of connectives and quantifiers in P: 


(aE 
(z= 
(-P)q is -(PQ); 


Ja is Qa) AQY) > rey; 

) ) 
) 

(Pi* P2)g is (Pi)q*(P2)g, for any connective *; 
) 
) 


Yy 
Ye is Q(x) AQy) > r= y; 
(VaP)g is YVu(Q(x) => P); 

q is Ax(Q(x) A P). 


6.3. Q(z) is called an (internal) model of LiSet if for any axiom P € ZF we 
have 


(SxP 


ZF + Po. 


This model is transitive if Q is transitive. 
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A formula P(y1,...,Yn) is called Q-absolute if 


ZF F (Q(y1) A::: AQ(yn)) > (P > Fa). 


6.4. The connection between these concepts and our earlier ones is as follows. 
Every formula Q(x) determines a class M = {X € V|Q(X) is V-true}. This 
class M has the property that 


|Plar(€) = |Palv(€), VE EM, 


for any formula P (as can easily be proved by induction on the number of 
connectives and quantifiers in P). Thus, to give a syntactic reformulation of 
our proofs we must make the following changes throughout; 


(a) We consider only classes M that are defined by formulas Q, and all 
references to M are replaced by references to Q. 

(b) We everywhere replace “P is V-true” by “P is deducible from ZF.” 

(c) We everywhere replace “P is M-true” by “Pg is deducible from ZF.” 

(d) We everywhere replace “P is M-absolute” by “P is Q-absolute.” 


In order for the new assertions on deducibility from ZF to become sufficiently 
obvious, we must either do some additional work formalizing the proofs or else 
give more careful intuitive proofs. In particular, we must find finite subsets of 
ZF from which the various facts are deducible. The basic results are stated as 
follows in the new syntactic language: 


6.5. dy L(x, y) “is” a transitive internal model of L1Set. 


6.6. ZFr(axiom of choice)3y 1(2,y)- 


6.7. ZF (generalized continuum hypothesis)3y r(«,y). 


6.8. Thus, a completely syntactic version of Gédel’s theory would consist of all 
the deductions implicit in 6.5-6.7, without any commentary. Of course, such a 
treatment has never been written. The formula dyL (a, y) alone takes up several 
pages; without appealing to semantics, it would be impossible either to think 
up, or to explain, or even to copy down all this without making mistakes. 
The deductions of all the required relativized formulas P3y p(2,y) would also 
be extremely long. This situation gives us an instructive example of what was 
discussed in “Digression: Proof” in Chapter II. 


7 What Is the Cardinality of the Continuum? 


After all we have learned about the Zermelo—Fraenkel language and axiom 
system, it might seem naive to return to this question. But we must do so 
if we consider mathematical meaning to be our primary concern. 
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Some specialists in the foundations of mathematics espouse a different 
point of view. Namely, they answer that the question itself is meaningless. It 
seems that Paul Cohen himself tends toward this viewpoint, at the same time 
admitting that “this is a hard decision” (P. Cohen, Comments on the founda- 
tions of set theory, Proc. Symp. Pure Math., vol. XIII, part I, American Math. 
Soc., Providence 1971, p. 12). 

From this point of view it is natural to reject almost the entire semantics 
of L,Set, including all the Vy starting with a = wo + 1 in the von Neumann 
universe. No halfway solutions can help matters, especially since questions con- 
cerning higher axioms of infinity or the so-called measurable cardinals are in an 
even worse position than the CH. 

It thus becomes necessary to try to find alternative languages and semantics. 
Here the differences of opinion are wide and irreconcilable. The most clear-cut 
position is that of the constructivists, although even among them there are 
different shades of opinion. The constructivists do not recognize infinity as a 
usable concept, and reject ineffective existence proofs. (It turns out that in prac- 
tice they often replace these ineffective proofs by a more carefully differentiated 
word usage— “there cannot not exist,” or “there quasi-exists” —which is nearly 
synonymous with certain linguistic precautions adopted in classical texts.) In 
our opinion, the shortcoming in their point of view is that constructivism is 
in no sense “another mathematics.” It is, rather, a sophisticated subsystem of 
classical mathematics, which rejects the extremes in classical mathematics and 
carefully nourishes its effective computational apparatus. 

Unfortunately, it seems that it is these “extremes”—bold extrapolations, 
abstractions that are infinite and do not lend themselves to a constructivist 
interpretation—that make classical mathematics effective. One should try to 
imagine how much help mathematics could have provided twentieth-century 
quantum physics if for the past hundred years it had developed using only 
abstractions from “constructive objects.” Most likely, the standard calcula- 
tions with infinite-dimensional representations of Lie groups that today play an 
important role in understanding the microworld would simply never have 
occurred to anyone. 

It is not impossible that a new (or a completely forgotten old) conception 
of the continuum, in which the continuum has no “cardinality,” could be found 
in the course of a deep investigation of the external world. The notion of a set 
consisting of elements may actually be adequate only for finite or countable 
sets, and “higher infinities” may turn out to be abstractions from objects of a 
completely different type. 

Physics seems to point up a difference in principle between “counting” 
and the Eudoxus—Dedekind idealization of measurement. The counting proce- 
dure applies to regions of attraction—‘“attractors” (R. Thom)—that are units 
not having sharp boundaries. The parts of a unit, even if they have physical 
meaning, are nevertheless attractors of a different sort. But even these ideas 
apparently stop making sense in the microworld. 


If nature has a fundamentally statistical aspect, it might be fruitful to 
consider mathematical models in which the statistical aspect appears as an 


174 IV The Continuum Problem and Constructible Sets 


undefined concept. The unexpected richness of the nonstandard interpretations 
of classical mathematics in Boolean-valued models agrees with the suggestion 
that all the words we say should be understood in a new way. 


7.2. We now discuss a less radical point of view on the continuum problem, 
according to which this question of its cardinality is meaningful. Then the main 
problem once again becomes how to determine the place of the continuum on 
the scale of alephs. 

Cohen concludes his book with the following opinion: “A point of view which 
the author feels may eventually come to be accepted is that CH is obviously 
false.... C is greater than &,,,%w,Nq where a = Ny etc. This point of view 
regards C’ as an incredibly rich set given to us by one bold new axiom, which 
can never be approached by any piecemeal process of construction.” 

We thus have a conjectural estimate from below for C’, and nothing more— 
not even a conjecture as to whether the cardinal C' is regular or singular. 

Of course, the real problem consists not only in guessing a plausible conjec- 
ture, but in supporting it with sufficiently convincing indirect evidence for it to 
become widely accepted, even if not proved. What sort of evidence could this 
be? In discussing new axioms for set theory, Gédel writes: 

there may exist ... other (hitherto unknown) axioms of set theory which a 

more profound understanding of the concepts underlying logic and mathe- 

matics would enable us to recognize as implied by these concepts. 

Furthermore, however, even disregarding the intrinsic necessity of some 
new axiom, and even in case it had no intrinsic necessity at all, a decision 
about its truth is possible also in another way, namely, inductively by study- 
ing its “success,” that is, its fruitfulness in consequences and in particular in 
“verifiable” consequences, i.e., consequences demonstrable without the new 
axiom, whose proofs by means of the new axiom, however, are considerably 
simpler and easier to discover, and make it possible to condense into one 
proof many different proofs. The axioms for the system of real numbers, 
rejected by the intuitionists, have in this sense been verified to some extent 
owing to the fact that analytic number theory frequently allows us to prove 
number theoretic theorems which can subsequently be verified by elementary 
methods. A much higher degree of verification than that, however, is conceiv- 
able. There might exist axioms so abundant in their verifiable consequences, 
shedding so much light upon a whole discipline, and furnishing such powerful 
methods for solving given problems (and even solving them, as far as that 
is possible, in a constructivistic way) that quite irrespective of their intrinsic 
necessity they would have to be assumed at least in the same sense as any well 
established physical theory (K. Gédel, What is Cantor’s continuum problem? 
Amer. Math. Monthly, vol. 54, no. 9, 1947). 


There is little to add here to this ardently expressed hope. But see §8 of 
Chapter VII, where it is shown using an idea of Gédel’s own that any new 
independent axiom can shorten to an arbitrary extent the proofs of suitable 
assertions that are provable without the axiom. This result somewhat 
weakens our confidence in pragmatic criteria for truth. 
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7.3. More than two decades after the publication of the first edition of this 
book, Hugh W. Woodin introduced interesting new ideas about the continuum 
hypothesis. 

His constructions enrich both our set-theoretic intuition and its formal 
language, in an intuitively consistent way. 

We will very briefly explain Woodin’s approach, following his notes “The 
continuum hypothesis. I, II,” Notices AMS, 48 (2001), no. 6, 567-576, and no. 
3, 681-690. We will work in the constructible universe of Section IV.1. 

Call a set X transitive if each element of an element of X belongs to X. 
The transitive closure of X is the minimal transitive set containing X. 

Let k be an infinite cardinal, and H(k) the set of all sets X whose transitive 
closure is of cardinality < k. Accepting the axiom of choice, one sees that 
any constructible set belongs to some H(k). Let ko, ki, ko,... be the increasing 
sequence of the first infinite cardinals. Woodin easily reinterprets H(ko) as 
the semiring of natural numbers N with addition and multiplication, and, with 
some effort, H(k,) as a particular structure on the set of subsets of this semiring. 
These efforts are justified by providing a list of axioms for these structures that 
are intuitive and provide a basis for generalization to H(k2). 

Having thus set the stage, Woodin takes up H(k2) and introduces an exten- 
sion of first-order logic and a new axiom modestly called (*). 

Here the grand finale arrives: in this context Woodin can prove that 28° = 
No. 

The following quotation from his second paper nicely concludes the discus- 
sion of this whole section: 

“So, is the continuum hypothesis solvable? Perhaps, I am not completely 
confident the ‘solution’ I have sketched is the solution, but it is for me a 
convincing evidence that there is a solution. Thus, I now believe the continuum 
hypothesis is solvable, which is a fundamental change in my view of set theory.” 


V 


Recursive Functions and Church’s Thesis 


1 Introduction. Intuitive Computability 


1.1. The first part of this book was primarily concerned with mathematical 
proof; we showed that the analogous concept in formal languages is that of 
formal deduction, after which the most interesting results were that certain 
intuitive mathematical assertions (such as the continuum hypothesis and its 
negation) are not deducible. 

Our primary concern in the second part of the book is the notion of a 
determinate computational process, that is, the processing of information, or, 
briefly, the notion of an algorithm. In §2 we give a precise and presumably 
complete characterization of everything that can be obtained using computa- 
tional algorithms. Then the most interesting results turn out to be assertions 
that certain intuitively defined functions cannot be computed by an algorithm 
(Chapter VI). 

Both the theory of proof and the theory of computation can be presented in 
large part independently of one another. This is the approach we have adopted, 
even though it does not correspond to the historical development. But when the 
machinery of both theories has been developed to a certain point, it becomes 
possible to apply each theory to investigate the other. The third part of the 
book is devoted to such applications. 

In this section we describe informally the main focal points of the theory of 
computability. We appeal to the reader’s intuitive notion of algorithms, which 
can be conveniently used to illuminate the structure and interrelations of the 
basic concepts. 

When we make these concepts precise in the next section, we shall not 
give a description of the algorithms themselves, but rather of their results, i.e., 
computable functions. The concept of an algorithm seems to lose too much in 
any formalization, while the notion of algorithmic computability seems not to 
lose anything essential. 


1.2. We now introduce several simple basic concepts. Let X and Y be two sets. 
A partial function (or mapping) from X to Y is any pair (D(f), f) consisting 
of a subset D(f) C X and a mapping f : D(f) — Y. Here D(f) (instead of 
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the earlier dom f) is called the domain of definition of f; f is defined at a point 
x € X if x € D(f); f is nowhere defined if D(f) is empty; and there exists a 
unique nowhere defined partial function. 

We let Zt = {1,2,3,...} denote the set of natural numbers, excluding zero. 
(It is not necessary, only convenient, to exclude zero.) If n > 1, we let (Zt)” 
denote the n-fold direct product of Zt with itself, i-e., the set of ordered n-tuples 
(11,...,2n), 7% € Zt. It is convenient to let (Z*)° denote the set consisting 
of an arbitrary element, denoted by “-”. The basic objects of our concern will 
be partial functions from (Zt)™ to (Z*)" for various m and n. When we clas- 
sify these functions according to their computability, the reader can think of 
the word “program” as referring to a program for a universal computer that is 
written without regard to time or memory limitations. Here every program for 
computing a function has a special “blank space” in which to insert the value 
of the argument. 


1.3. The basic informal definitions. (a) A partial function f from (Zt)™ to 
(Z*)” is called computable if there exists a “program” that, whenever a vector 
x € (Z*)™ is entered in the input, gives as output 


f(z), if x € D(f); 
0, ifa ¢ D(f). 


Here 0 merely indicates that f is not defined at x; we could allow the output 
in this case to be anything not in (ZT)”. 


(b) A partial function f from (Zt)™” to (Z*)" is called semicomputable if 
there exists a “program” that, whenever a vector x € (Z*)™ is entered in the 
input, gives f(x) as output if « € D(f), and either gives 0 as output or else 
works infinitely long without stopping if « ¢ D(f). 

In particular, computable functions are semicomputable, and everywhere 
defined semicomputable functions are computable. 


(c) A partial function f is called noncomputable if it does not satisfy 
condition (b) (and a fortiori (a)). 


1.4. Comments 


(a) The most basic of these three concepts is semicomputability, since 
computability reduces to this property. In fact, to determine whether a 
semi-computable function is computable, we proceed as follows. 

Let X C Y be two sets. By the characteristic function of X in Y we mean 
the function yx : Y — Zt such that 


a) afb rex: 
‘A — 
ae 2, ifadX. 


Note that yx is everywhere defined on Y. 
Now let f be a semicomputable function from (Z*)™ to (Z*)". If f were 
computable as well, then the characteristic function of D(f) would also be 
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computable: simply add to the program that computes f the instructions 
“send 0 to 2, and anything not 0 to 1, and print as output.” Conversely, if 
Xp(f) is computable, then so is f: in front of the program that semicomputes 
f, put the program that computes pf) and then the instruction to give 0 as 
output immediately if yp:p)(v) = 2 and to continue with the program for f 
with x as the argument if yp:f)(#) = 1. Thus, since the everywhere defined 
function pf) is computable if and only if it is semicomputable, we have f is 
computable = f is semicomputable and yp) semicomputable. Later, we shall 
first formalize the concept of semicomputability, and then take the right side of 
this equivalence as the formalization of computability. 


(b) There exist noncomputable functions. In fact, any program is a finite 
text in a finite alphabet, so that the set of programs is countable, while the 
set of all functions Z* — Zt is uncountable. (For a critical discussion of this 
argument, see 1.5 below.) 


AN EXAMPLE OF A NONCOMPUTABLE FUNCTION. We consider the language 
of arithmetic SAr, which was described in §10 of Chapter H, and number the 
formulas of this language as explained in $11 of Chapter II. We define a function 
f by stipulating that 


= 1, if the xth formula is true in the standard 
interpretation; 


is not defined, if the zth formula is false. 


The function f is noncomputable. In Chapter VII we shall see that this follows 
because the set D(f) is not definable in arithmetic, by Tarski’s theorem. 

In other words, it is impossible (even in principle) to distinguish the set 
of all number-theoretic truths by writing a single program (even a very long 
and complicated one) that could tell from a statement’s formulation whether it 
is true. Of course, to prove this result requires a much deeper analysis of the 
concept of computability. 


(c) There exist functions that are semicomputable but not computable. 
We first give a typical example of a program that semicomputes a function. 
We consider the following function f from Z* to Z*, which is defined in terms 
of Fermat’s problem: 


if there exist x,y,z € Z* for which 
f(n) gente ut yrt = ynr2 


is not defined, otherwise. 


ri 


Here is a program that semicomputes f: after entering n in the input, run 
through all vectors (x,y,z) in a suitable order. (For example, according to 
increasing « + y + z, and for given « + y+ 2z, in lexicographic order.) For each 
such vector verify whether "+? + y"*? = z"*?. If this equation holds, give 1 
as output; otherwise, go on to the next (a, y, z). 

Hence, f is semicomputable. But it is not known whether f is 
computable. According to Fermat’s conjecture, f is nowhere defined (and 
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hence computable!). The strongest theoretical results known concerning f—the 
so-called criteria of Kummer, Wieferich, Vandiver, and others—may be regarded 
as a sort of approximation to proving that f is computable, not that f is nowhere 
defined. That is, in order to verify the Fermat conjecture successively for vari- 
ous values of n, we must perform a (machine) computation (whose size grows 
rapidly with n) to determine yp,f) at the point n, when this determination is 
possible. 

There is an analogous example of a semicomputable function that we 
actually know is not computable. In Chapter VI we prove that there exists 


a polynomial P(t,21,...,2,) with integer coefficients such that the function 
=1, if the equation P(t,21,...,@n) = 0 is solvable 
g(t) with 71,...,% € Z?; 


is not defined, otherwise, 


is not computable. This function is semicomputable by the same argument as 
in the case of the function connected with Fermat’s equation. 


1.5. Critical discussion of the above proofs. Before proceeding further, we 
consider from a more critical point of view, for example, the argument in 1.4(b). 
The first weak point that catches our attention is that we did not say precisely 
what a program is. But this is not essential; for any fixed definition we choose, 
a program must in any case be a text in a finite alphabet if it at all corresponds 
to our intuitive notions, and there are countably many such texts. A much 
stronger objection to the argument goes roughly as follows: what justification 
do we have for working with just one definition of what a program is? Could 
there perhaps exist an increasing hierarchy of precisely describable “methods 
of computation,” so that for every function from Zt to Z* we could choose a 
corresponding program that could compute this function? 

A fundamental discovery in the theory of computability was that this last 
question has a negative answer. We now have a unique and final formal notion 
that corresponds to the intuitive idea of semicomputability. It can be stated as 
follows: 


1.6. Church’s Thesis (weakest form). It is possible to give explicitly: 


(a) a family of basic semicomputable functions; 
(b) a family of elementary operations that, starting from any semicomputable 
functions, allow new semicomputable functions to be constructed; 


with the property that any semicomputable function can be obtained in a finite 
number of steps, where each step consists in applying one of the 
elementary operations to the functions constructed before and those in the 


family (a). 


' Since the publication of the first Edition, Fermat’s conjecture was proved by Wiles, 
so now we know that f is computable and empty. The reader may wish to replace 
in our discussion f by another function, say characteristic function of the set of 
numbers n of such primes py that pn4i = pn+2. This is another old number- 
theoretic problem. It remains unsolved. 
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1.7. Comment. Church’s thesis will be given a precise formulation in the next 
section: the basic functions and the elementary operations will be given exp- 
licitly. The exact mathematical theory of computability begins at that point. 
But it seemed important to indicate first the general significance of the discovery 
that such families of functions and operations exist at all and can even be given 
explicitly, a result that is far from obvious. 


This is an experimental fact, one of the most important discovered by logic. 
In the next section we discuss evidence of its value and usefulness. Now we 
merely note that this fact is related to the finiteness of the basic logical and set- 
theoretic principles of mathematics (implicit, for example, in L,Set), but is not 
identical to this finiteness. 


2 Partial Recursive Functions 


2.1. In this section we give the precise definition and the basic properties of a 
class of partial functions from (Z+)™ to (Zt)”", which we take as an adequate 
formalization of the class of semicomputable functions. We give the definition 
in a way parallel to the statement of Church’s thesis in 1.6. 


2.2. The basic functions 
suc:Z' Zt, suc(z)=2+4+1; 
1. (Zt)" 3 Zr, 1™ (24, tag Sel, a0, 
pr PZ. erin, =e, nS 1, 


2.3. The elementary operations on partial functions 

(a) Composition (or substitution). This operation associates to every pair 
of partial functions f from (Z*)™ to (Z*)”" and g from (Z*)” to (ZT)? the 
function h = go f from (Z*)™ to (Z*)? that is defined as follows: 

D(go f) = f-"(D(g)) = {x € (Z*)" |x € D(f), f(z) € D(g)}; 
(90 f)(x) = g(f(a)). 

(b) Juaxtaposition. This operation associates to partial functions f; from 
(Z*)™ to (Z*)™, 4 =1,...,, the function (f1,..., f,) fom (Z*)™ to (Zt)™ x 
+++ x (Z*)" that is defined as follows: 

D((fi,---5fe)) = D(fi) +++ D(fe); 
eae oe er cae wae , Lm) a (fila, eta :fm), aa -> fx(a1, eae ,Lm))- 
(c) Recursion. This operation associates to a pair of partial functions f from 


(Z*)" to Z* and g from (Z*)"*? to Zt the partial function h from (Z*)"t? 
to Z* that is defined by recursion on the last argument: 


h(a1,..-,;%n,1) = f(xi,...,2Ln) (initial condition); 
h(a1,---,%n,k +1) =9(m1,...,Un,k, h(a1,...,2n,k)), fork >1 


(recursive step). 
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The domain of definition D(h) is also defined by recursion: 
Mi wrcgdires € D(h) - (Gigdang ty) e D(f), 


(U1,.++;%n,k +1) € D(h) S (a1,...,%n,k) € D(h), and 
(@1,-+-,Un,k, h(a1,..-,2n,k)) € Dig) for k>1. 


(d) The p-operator. This operation associates to a partial function f from 
(Z*)"*+ to Z* the partial function h from (Z*)" to Z* that is defined as 
follows: 


D(h) = {igi ig) gt 2 Lyf (Bij +25 ng) =land 
(€1,---,%n,k) € D(f) for all k < aay 


h(a1,..-,%n,) = min { Ga lf Cayessy Bing Mee) = if 


The general role of y is to introduce “implicitly defined” functions, as is 
often done in many areas of mathematics. Three remarks about the definition 
of y should be made at this point. First, we obviously chose the minimal y 
with f(a1,...,2n,y) = 1 in order to ensure that the function h is single-valued. 
The second observation is that at first glance, it might seem that the domain of 
definition of h is artificially narrow. If, for example, we have f(a1,...,%n,2) =1 
and f(a1,...,@n, 1) is not defined, then we have taken h(a1,..., 2) to be unde- 
fined, rather than equal to 2. This is done because we want to preserve intuitive 
semicomputability in going from f to h, as will be discussed in somewhat greater 
detail below (see 2.7(a)). 

Finally, we note that all the operations before py, if applied to everywhere 
defined functions, give an everywhere defined function. This is obviously not 
the case for yw. Thus, yu is the only one of the operations that causes partial 
functions to arise unavoidably. 


2.4. Definition. 


(a) A sequence of partial functions fi,..., fx is called a partial recursive 
(respectively primitive recursive) description of the function fy = f if 


fi belongs to the family of basic functions; 

fi, 1 > 2, either belongs to the family of basic functions, or else is 
obtained by applying one of the elementary operations (respecti- 
vely one of the elementary operations other than j) to certain of 


the functions S13 sang fi-1- 
(b) A function f is called partial recursive (respectively primitive recursive) 
if it admits a partial recursive (respectively primitive recursive) description. 


(The analogy with the definition of a deduction in a formal language immedi- 
ately catches our attention, and can sometimes be of use.) 
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2.5. Church’s Thesis (usual form) 


(a) A function f is semicomputable if and only if it is partial recursive. 
(b) A function f is computable if and only if both f and xpf) are partial 
recursive. 


Remark on terminology. Everywhere defined partial recursive functions are 
also called general recursive functions. If the domain of definition is either clear 
or not essential in a given context, we simply use the term “recursive.” (Note 
that every primitive recursive function is general recursive.) 


2.6. Use of Church’s thesis. Before discussing in detail the arguments supporting 
Church’s thesis, we indicate how it is used in practice in mathematics. Two basic 
applications are especially evident in the literature. 


(a) Church’s thesis used for a definition of algorithmic undecidability. 
Suppose we have a countable sequence of mathematical “problems” 
P,, P»,.... Further, suppose that each problem has a “yes” or “no” an- 
swer, and that the conditions in P, are written out “effectively” as a function 
of n. Such a sequence P = (P,,) is called a “mass problem.” We associate to 
such a problem a function f from Zt to Zt 


D(f) = {i € Z*|P; has “yes” for an answer}; 
f= 1, ifie D(f). 


A mass problem P is called algorithmically decidable if the functions f and 
Xp(f) are partial recursive. Otherwise, P is called algorithmically undecidable. 
We also distinguish the case in which only x pf) is not partial recursive from the 
case in which even f is not partial recursive. The second type of undecidability 
is worse than the first; we saw examples of this in §1. Finally, a whole hierarchy 
of “degrees of undecidability” can be rigorously defined and investigated. 

A well-known example of a mass problem is the problem of word identities 
in groups. Let G be a finitely defined group, and let aj,...,a,; € G be elements. 
A “reduced word” in a1,...,a, is an expression of the form a;' ---a;*, where 
k > 1, 6; = £1, and €; = €j;41 whenever 74; = 1;41. We number all 
the reduced words and ask the question P,,: “Does the nth word represent the 
unit element of the group G?” The “mass problem” (P,,) turns out to be 
algorithmically decidable for certain groups G and elements a1,...,@, and 
algorithmically undecidable for others (Novikov, Boone, Higman). The func- 
tion f in this case is always partial recursive, but ypf) is not always (see 
Chapter VIII). 

For another example of an undecidable problem, this one connected with 
Diophantine equations, see Chapter VI. 


(b) Church’s thesis as a heuristic principle. The intuitive notion of 
“semicomputability” at first seems broader than the notion of “partial recur- 
siveness,” and many problems concerning partial recursive functions become 
much easier if we replace the conditions in the problems by informal ideas and 
allow such ideas to be used to solve the problems. For example, the formula 
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e = lim(1 +1/n)” and the Euclidean algorithm make it intuitively clear that 
the functions f, g: Zt — Z* given by 


f(n) = the nth digit in the decimal expansion of e, 


g(n) = the nth prime number 


are computable, but the verification that they are recursive requires rather 
painstaking constructions. 

Church’s thesis allows us to solve such problems in two stages: (1) finding an 
informal solution using any intuitive algorithms we need, and (2) formalizing 
the solution. The second stage presupposes a certain proficiency in finding a 
partial recursive description for a wide variety of semicomputable functions, 
and Church’s thesis assures us that such a description exists. 

As proofs of recursiveness become more and more numerous in the literature, 
it becomes increasingly common to go through only the first stage of the solu- 
tion; a striking example of this is Hartley Rogers’ book Theory of Recursive 
Functions and Effective Computability (McGraw-Hill, New York, 1967). We 
shall also take such liberties toward the end of this book. All the same, there 
is a certain danger in this practice. It is possible that the habit of increasingly 
using informal arguments delayed the discovery of such a fundamental fact as 
the result that recursively enumerable sets and Diophantine sets coincide. 


2.7. Arguments in support of Church’s thesis 


(a) First of all, the basic functions clearly must be computable, no mat- 
ter how we precisely define the notion of computability. Furthermore, when the 
elementary operations are applied to semicomputable functions, they again give 
a semicomputable function. A program to semicompute the latter function can 
easily be put together from the programs that semicompute the original func- 
tions. We shall consider only the case of the y-operator in detail, leaving the 
simple construction of the other three programs to the reader. 

In the notation of 2.3(d), let f be a semicomputable function from 
(Z*)"*1 to Zt. In order to compute h(x1,...,%n), we go through the vectors 
(@1,02,---;2n,1), (a1,..-,%n,2),... in the order of increasing last coordinate, 
and compute the values of f at these vectors. If (a1,...,%p) € D(h), where h is 
obtained from f by applying the p-operator, then the program for f successively 
computes 


PWxp ice eg ll ing Ff Bp G1), 


and finally f(#1,...,@n,y) = 1. The least such y, if it exists, must be given as 
output; it will be the value of h at the point (21,...,2,). On the other hand, 
if it turns out that one of the values f(21,...,2n,k) (before we reach f = 1) is 
not defined, then either the program that semicomputes f will work infinitely 
long, or else it will give an answer not in Z*, which must then be given as 
output. But then, by definition, h is not defined at the point (a1,...,2@), and 
the behavior of the program for h still agrees with the definition of h being 
semicomputable. 
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From all this we conclude that partial recursive functions are semicomputable. 
However, the stronger part of Church’s thesis is the converse: semicomputable 
functions are partial recursive. (The definition of computability in terms of 
semicomputability is simply taken from §1 without any changes.) As has been 
said, this result is an experimental fact. The experimental evidence for it is 
divided into several classes, which we consider in (b)—(d) below. 


(b) In the literature we find a huge collection of recursive descriptions of 
various computable and semicomputable functions. See, for example, Rozsa 
Péter, Recursive Functions (Academic Press, New York, 1967). We shall give 
part of this list in the next section. We also find certain techniques for composing 
recursive descriptions that are applicable to entire classes of (semi)computable 
functions. Every time an author has tried to find a partial recursive description 
of a (semi)computable function, he has met with success. 


(c) Turing proposed a mathematical characterization of an abstract com- 
puter, and gave strong arguments to the effect that this computer is universal, 
ie., it can (semi)compute any (semi)computable function. His arguments came 
from a detailed analysis of the characteristic features of determinate computa- 
tional processes. (We again recall that we have not at all concerned ourselves 
with formalizing computational processes, but only with the results of such pro- 
cesses.) It turned out that the class of functions that are semicomputable by 
Turing machines exactly coincides with the class of partial recursive functions. 


(d) Church, Post, Markov, Kolmogorov, Uspenskii, and others have pro- 
posed other deterministic schemes for processing information of a general (not 
necessarily number-theoretic) character. In all cases it has turned out that if the 
sets of input and output are numbered in a suitable “effective” way, these meth- 
ods lead to a class of maps from Zt to Z* that coincides with some subclass 
of the partial recursive functions. 


For further discussion of Church’s thesis, we refer the reader to the literature; 
see, in particular, S. Kleene, Introduction to Metamathematics (Van Nostrand, 
New York—Toronto, 1952). 


3 Basic Examples of Recursiveness 


3.1. In this section we give a short list of recursive functions and a selection of 
basic techniques for proving recursiveness. Both these lists will subsequently be 
enlarged when needed (in particular, see Chapter VII). 


3.2. (a) sumg : (Z*)? = Zt, (a1, 22) B21 +20. 


Use recursion on £2, starting from the initial condition 
v1 +1= sumo(a1,1) = suc(x1) 
and applying the recursive step 


ay +k+1= sumo(a1,k4 1) = suc(sumo(21,k)). 
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(b) sum, :(ZT)" > Zt, (a1,...,2n) BP ei, n> 3. 


Suppose that we already know that sum,_, is recursive. We can obtain 
sum, by juxtaposition and composition as follows: 


stm, = sumy o (sumy—1 0 (pr?,...,pr"_y), pr”). 


Another version is to use recursion on 2, starting from the initial condition 
suc osum,—; and applying the recursive step 


n—1 

y aj +k+1= suc(sum,(21,...,%n-1,k)). 

i=1 
This choice of recursive descriptions, even of “natural” ones, will become even 
more numerous as the functions become more complicated. 


3.3. (a) prodg : (Zt)? > Zt, (a1, 22) 129. 


Use recursion on x2, starting from the initial condition x2, and applying the 
recursive step 
x(k + 1) =ak+a,= suma(a1k, £1). 


(b) prod,,¢ (Z*)" + Zt, @iys ssn) Oy. Way TSS. 
prod, = prods o (prod,,_; 0 (prt, -.-, Pt—1);Prn)- 
a—-1, ifa>2; 


3.4. (a) Z+ 3 Zt, rea-1l= : 
1 ifw=1. 


? 


Use recursion with the functions 


: Zt 0 = Zt oe Te 
f 2 P| 

g= pit, : (Z+)? Zt, (21,22) 21. 
(b) (Zt)? > Zt: 


: %1—%2, if 21 > 29; 
(£1, £2) ro @1— 2 = 


1, if Ly < vQ. 
This “truncated difference” is obtained by applying recursion to the functions 


f(a) = 21-1; 


(a1, 2,23) = 23-1. 


3.5. F : (Zt)" > Z*, where F is any polynomial in 2,...,%, with integer 
coefficients that takes values only in Zt. 

If all the coefficients in F' are nonnegative, then F' is a sum of products 
of the functions pr? : (71,...,%n) ++ vj. Otherwise, we write F = Ft — F-, 
where F+ and F~ have nonnegative coefficients, and at all points of (Z*)" 
the nontruncated difference coincides with the truncated difference F*+— F~ 
because of the assumption concerning F’. 
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We shall often use the recursiveness of the function (x; — x2)? + 1, or 
h = (f —g)? +1, where f and g are recursive. This technique allows us to 
identify the set on which f = g with the “level set of h at 1,” ie., the set on 
which h = 1. 


3.6. “Step functions”: for each a,b,79 € Zt, the function defined by 


s(x) 2 i for x < 20, 


b, for x > 20. 


If x9 = 1, we obtain this function by recursion with initial value a and all the 
succeeding values b. In the general case we set 


932h(2) = sf"(e + 1-20). 


3.7. rem(x,y) = the remainder in [1,a] (since we cannot use zero!) when y is 
divided by x. 
We have 


rem(x, 1) = 1, 


if rem(2,y) = 2; 


1 
rem(z,y+1)= 


sucorem(z,y), ifrem(x,y) #2. 


We now apply a somewhat artificial technique. We consider the step function 
s=s'"', ie., s(1) =2 and s(x) = 1 if x > 2, and we set 


d(x, y) = s((rem(x, y) — x)? + 1). 
Obviously, 


rem(z,y) Ax > o(z,y) = 1, 
rem(z,y) = = o(z,y) = 2, 


so that 
rem(x,y + 1) = 2 suc(rem(z, y)) — o(a, y) suc(rem(z, y)). 


This gives a recursive definition of rem. 
We next describe this technique in a more general form. 


3.8. Suppose h is defined by “recursion with conditions,” i.e., 
h(a@1,.--,%n,1) = f(a1,...,0n);h(a1,..-,¢n,k +1) 
= gi(X1, as »Tn; k, h(a, aye »Un, k)), 


if the condition Cj(a1,...,%n,k,h) holds, i = 1,...,m, where the exhaustive 
and mutually exclusive conditions C; are given in the form 


C; is fulfilled = bi (x1, rete Biny les Diy s-<< iy) =i 
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with $; an everywhere defined recursive function that takes only the values 
1 and 2. Then we can write the recursive step as follows: 


h(a@1,.--,@n,k +1) = 2S Gi @iy oni stg, by Mer, cis ct) 
i=1 
-~S*( 


i=1 


9%) (x1, ishing By A Gage. 5 0nis ave 


This device allows us to show that the following functions, which will be 
needed later, are primitive recursive: 


the integral part of y/z, if y/x > 1; 
3.9.  qt(a,y) = | 
1, if y/a <1. 
We have 
qt(x, 1) = 1; 
at(x,y), if rem(x,y +1) #2; 
qt(z,yt1)= 4 qt(z,y)+1,  ifrem(z,yt+1)=candy+1#za; 
1, ify+l=a. 


We reduce the conditions to the standard form 3.8 using the functions 
3((rem(x,y + 1) — x)? +1), 
s((rem(x,y + 1) — x)? +1) -8((e@ —y—1)? +1), 
s((a@-—y- 1)? + 1), 


where s = s;’" and §=s;"". 
3.10. rad(x) = the integral part of /z. 
We have 

rad(1) = 1, 


rad(x), if qt(rad(a) + 1,a+1) < rad(x) +1; 
rad(x) + 1, if qt(rad(a) + 1,a+1)= rad(x) +1. 


rad(z +1) = 


The reduction of these conditions to the standard form 3.8 will be left to the 
reader. 


3.11. (a) min(z, y): 


(b) max(a, y): analogous. 
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3.12. If f(a1,...,@n) is recursive, then 


Bf = Fis <cite iy) and PP = || feu) 
k=1 k=1 


are recursive 
In fact, 


Sf(@1,---,@n-1,%n +1) = Sf(ai,..., Un) + f(ai,..., an +1), 
Pf(a1,..-,%n—-1,0n +1) = Pf(ai,...,Un)+ f(a1,..., an +1). 


3.13. If f(a1,...,@n) is recursive, then so are the functions obtained from 
Ff by: 


(a) any permutation of the arguments; 

(b) adding any number of “dummy” arguments; 

(c) identifying the elements of any subset of the arguments (f(x,x) instead of 
f(x,y), and so on). 


In fact, all of these functions can be obtained from f and the various pr?” 
using composition and juxtaposition. 


3.14. A map f : (Zt)™ — (Z*)” is recursive if and only if all of its components 
pri’ o f are recursive. 

This is obvious. 

In conclusion, we note that all the specific functions described above are 
primitive recursive, and that all the above general operations, when applied to 
primitive recursive functions, yield primitive recursive functions. Starting in the 
next section, we shall make essential use of the p-operator, which was defined 
in 2.3(d). 


4 Enumerable and Decidable Sets 


4.1. Definition. A set E Cc (Z*)” is called recursively enumerable if there exists 
a partial recursive function f such that E = D(f) (the domain of definition 
of f). 

The discussion in §1 and §2 showed that recursive enumerability has the 
following intuitive meaning: there exists a program that identifies the elements 
xin E but that might not identify the elements not in E. Later, in 4.12 and 4.18, 
we shall give another intuitive description of recursively enumerable sets that 
is more closely related to the etymology of the name: these are sets all of whose 
elements can be obtained using a suitable “generating” program (perhaps with 
repetitions and with no indication of the order in which the elements occur). 

The concept of a recursively enumerable set occupies a central place in the 
theory of computability, alongside the concept of a partial recursive function. 
It will later be clear, in particular from Proposition 4.15, that either of these 
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concepts can be reduced to the other one. However, only by using both ideas 
together do we obtain the flexibility necessary for efficient proofs. 

We begin with the following simple fact. 

Recall that the level set at m (or simply the m-level) of a function f from 
(Z*)" to Z* is the set E C D(f) such that 


ceEeEs f(x) =m. 


4.2. Proposition. The following three classes of sets coincide: 


(a) Recursively enumerable sets. 
(b) Level sets of partial recursive functions. 
(c) Level sets at 1 of partial recursive functions. 
(a) C (c). Suppose that F is recursively enumerable, so that E = D(f), where 
f is partial recursive. Then E = the 1-level of the function 1 o f. 
(b) = (c). The m-level of f coincides with the 1-level of (f — m)? +1. The 
function (f —m)?+1 is partial recursive whenever f is, by Proposition 3.5. 
(c) C (a). Suppose that E is the 1-level of a partial recursive function 
f(a1,.--,2n). Set 


g(@1,---,;%n) = min {ul(f(er, ++) —1}?+y= i}. 


Obviously, g is partial recursive and E = D(q). 


The following much more difficult assertion, along with its corollaries, 
constitutes the central result of this section. 


4.3. Theorem. The following two classes of sets coincide: 


(a) Recursively enumerable sets. 
(b) Projections of level sets of primitive recursive functions with values in ZY. 


4.4. FIRST PART OF THE PROOF. We first recall that if we are given a set 
Ec (Zt)"*™, then its projection (“onto the space of the first n coordinates” ) 
is the set F C (Z*)" that is defined as follows: 


(01,.--;%n) € F 
 Alyi,..-54%m) €(ZT)Y™, (01, ..-, 2, Y1,-+-,Ym) € E. 


(From this point on, we shall not adhere to the practice in Part I of using 
different notation for “variable coordinates” and for particular values of the 
coordinates.) We similarly define the projection “onto the coordinates with 
indices (#1,...,%n) C (1,...,m-+m).” The number m is called the codimension 
of the projection. The canonical map E — F (as well as its image) is also 
customarily called a projection, but this is not likely to cause any confusion. 

For the time being we shall call projections of level sets of primitive recur- 
sive functions primitive enumerable sets. The first part of the proof consists in 
showing that primitive enumerable sets are recursively enumerable; the second 
part consists in verifying the converse implication. 

Thus, let f(@1,.--,@n,Un41,---;U%n+m) be a primitive recursive function, 


) 


and let E be the projection of its 1-level onto the first n coordinates. (We need 
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only consider 1-levels because of the consideration used once before: the k-level 
of f coincides with the 1-level of f’ = (f — k)? + 1.) We explicitly construct a 
partial recursive function g such that E = D(q). 

We distinguish three cases, depending on the codimension of the projection: 
m=0,m=1, and m > 2. 

Case (a): m = 0. Then E = the 1-level of f = E is recursively enumerable, 
by Proposition 4.2 (where g is constructed explicitly). 

Case (b): m = 1. Let 


g(@1,.--,;%n) = min {igs |FBaga tig Pay tea) = it 


Obviously, g is partial recursive, and D(g) = E. (Notice that we have used here 
the fact that D(f) = (Zt)"*1.) 

Case (c): m > 2. We reduce this case to the previous one using the following 
lemma, which is also important in many other situations and is of interest in 
its own right (as a statement that there is no notion of dimension in “recursive 
geometry”). 


4.5. Lemma. For each m > 1 there exists a one-to-one mapping t™): 
Zt — (Zt)™ such that 


a) The function t! = pr?” ot is primitive recursive for alll <i<m. 
a u 
(b) The inverse function r™ : (Zt)™ — Z* is primitive recursive. 


4.6. How the lemma is used. Suppose that the lemma is true. We apply it to 
the situation in case (c) in 4.4 as follows. For m > 2 we set 


g(21,-+-52n,y) = f(#1,--+, 2m, th” (y),..., t(y)). 


Obviously, g is primitive recursive if f is. It is easy to see that FE coincides with 
the projection of the 1-level of g onto the first n coordinates. Since this is a 
projection of codimension 1, we have reduced this case to the previous one. 


4.7. PROOF OF THE LEMMA. The case m = 1 is trivial. We use induction on m, 
starting with m = 2. 

Construction of t?). We first construct 7?) : (Z+)? — Zt explicitly by 
setting 


7) (x1, v2) = =((x1 t x2) vy 322 | eae 


It is easy to see that if we list the pairs (71,72) € (Zt)? in “Cantor order,” i.e., 
according to increasing x; + x2 and, among those with given 71 + x2, according 
to increasing 2, then 7°?) (x1, x2) will be precisely the index of the pair (a1, x2) 
in this list. Thus, 7°?) is a one-to-one correspondence and, moreover, is primitive 
recursive (where we use Proposition 3.5 and then the recursiveness of qt in 3.9 
to take care of the 1/2). 

The calculation of the pair (71,72) as a function of its index y is an 
elementary problem, and results in the following formulas for the inverse 
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function t®): 


Here [z] denotes the integral part of z. The verification that these functions are 
primitive recursive using the results (and techniques) of §3 is left to the reader 
as an exercise. 

Construction of t°™,m > 3. Suppose that t°"—) and 7’) have already 
been constructed with the required properties. We first set 


(04,2... 48m) = tT? (ro*™-D (ay, paagni tin) 


It is clear that 7° is one-to-one and primitive recursive. Solving the equa- 
tion 7) (a2) (as, oes tet ye) = y in two steps, we obtain the following 
formulas for the inverse function ¢(”): 


to (y) = #8 (y), 
t™ (y) = 0" Pty), 1 <i<m-1. 
The tm) are primitive recursive by the induction assumption. This completes 


the proof of the lemma, and by the same token the first part of the proof of 
Theorem 4.3. 


SECOND PART OF THE PROOF. We must now show that every recursively 
enumerable set is primitive enumerable. We begin with the following property 
of the class of primitive enumerable sets. 


4.8. Lemma. The class of primitive enumerable sets is closed with respect to 
the following operations: finite direct product, finite intersection, finite union, 
and projection. 


Proor. Let E,E Cc (Z+)" and E, c (Zt+)™ be three primitive enumerable 
sets that are projections of the 1-levels of the primitive recursive functions 
f,f , and f,, respectively: 


v= (€1,.-.,Un) € BS Ay = (y1,---, Yr); f(a,y) =1, 
= (Cisgereg a) eB S Se = (iyo +-5 hy), f (2,2) =; 
u = (U1,.-.;Um) € By & Ju = (v1,...,Us), fi(u,v) = 1. 


We then have 


E x EF, =a projection of the 1-level of the function 


g(x, u;y,v) = f(x,y) - filu, v); 
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EUE =a projection of the 1-level of the function 
ENE =a projection of the 1-level of the function 
g(a y,2) = f(x,y): f (a, 2). 


Closure with respect to the projection operation is clear from the definition. 
Lemma 4.8 is proved. 


Now let E be a recursively enumerable set. We realize E as the 1-level of 
a partial recursive function f from (Z*)” to Z* using Proposition 4.2, and we 
note that to prove that E is primitive enumerable, it suffices to show that the 
graph Ty C (Z*)" x Z* of f is primitive enumerable. In fact, it is clear that 
E = the 1-level of f = the projection of the set . pM [(Zt)” x {1}] onto the first 
n coordinates. Here the set {1} C Zt is primitive enumerable (for example, 
by 3.6), so that if we prove that I'y is primitive enumerable, it will follow from 
Lemma 4.8 that the same is true for E. Thus, our problem is finally reduced to 
the following form: we must prove that the graph of a partial recursive function 
f is primitive enumerable. To do this we verify that first, the graphs of the 
simplest functions are primitive enumerable, and second, if we apply any of the 
elementary operations to functions having primitive enumerable graphs, then 
the resulting function also has a primitive enumerable graph. 

Graphs of the basic functions 


Tuc € (Zt)? = the 1-level of (x1 +1— 2)? +1, 
Tym C (Zoe = the 1-level of 41, 
Torr C (Z*)"*! = the 1-level of (2; — a,11)? +1. 


Stability under juxtaposition. Let f and g be partial functions from (Z*)™ 
to (Zt)? and (Z*)4, respectively. Suppose that 'y and I, are primitive enu- 
merable. Then Iyp,,) C (Z*)™ x (Z*)? x (Z*)4 coincides with the intersection 


(Cy x (2%) N perm(I'y x (Z*)P), 


where perm : (Z*)™ x (Z*)¢ x (Zt)? — (Zt)™ x (Z*)P x (Z*)4 is the operation 
of permuting the last two factors: 


(a0), yl, 20?) Hs (ar), 2), yl), 


It is clear from Lemma 4.8 that ['(,,) is primitive enumerable. 

Stability under composition. Let g be a partial function from (Zt)” to 
(Z*)™, let f be a partial function from (Z*)™ to (Z*)!, and let h = fog. Then 
I, = the projection of the set (I x (Z*)?) 1 ((Zt)" xy) onto (Zt)” x (Zt)?. 
As before, if and Tg are primitive enumerable, then so is I, by Lemma 4.8. 

The stability relative to recursion and the p-operator is much subtler. We 
shall need the following elegant and useful lemma. 


4.9. Lemma. There exists a primitive recursive function Gd(k,t) (Gédel’s 
function) with the following property: for any N € Zt and any finite sequence 
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ai,.-.,an € Z* of length N, there exists t € Z* such that Gd(k,t) = ay for 
alll <k < N. (In other words, the function Gd allows us to consider integers 
as encoding arbitrarily long sequences of integers: Gd(k,t) is the kth member 
of the sequence encoded by t, and the existence assertion ensures that each 
sequence has an encoding.) 


PROOF. We first set 
gd(u, kt) = rem(1 + kt, u) 

and show that gd has the same property as Gd if we are allowed to choose 
(u,t) € (Zt)?. Once we show this, we can set Gd(k, y) = ed(t?(y), k, t?(y)), 
where t?) : Z+ — (Zt)? is the isomorphism in Lemma 4.5. (It is not really 
essential to remove the extra parameter in gd(u, k, t), but working with Gd(k, t) 
will make some of the formulas shorter.) 

Thus, suppose we are given a1,...,an € Z*. We first choose X € Z* so as 
to satisfy X > N and1+kX! > a, for all 1 < k < N. We then set ¢ = X!. 
It is easy to see that if ky A ko and ky, ko < N, then 1+k,X! and 1+ k2X! are 
relatively prime, since any common divisor would have to divide (k, — k2)X!, 
i.e., would have to consist of primes < X, but no such prime divides 1 + k,X!. 

By the Chinese remainder theorem, there exists a solution u € Zt of the 
system of equations 


JIN 


u=a, mod(1+kXx!}), 1Ll<k<Nn. 


It is then obvious that 


gd(u,k,t) = rem(1 + kt,u) = ax, L<k<Nn. 
We now continue with the proof of Theorem 4.3. 
4.10. Stability relative to the p-operator. Let f be a partial function from 
(Z*)"*1 to Z* and let 
g(@1,---,;%n) = min {yl f(x1, ASO) Lt. 


Recall that the domain of definition of g consists of those (x1,...,%n) for which 
such a y exists and (x1,...,2%n,k) € D(f) for all & less than the least such y. 
We want to prove that if I is primitive enumerable, then so is Ty. 

Suppose that I is the projection onto the first n + 1 coordinates of the 
1-level of a primitive recursive function F’: 


@=f(t1,..-,€n41) 
> A Yiy 00, Yon); F(a1,.--,2n41, 9; Y1y +++) Ym) =1 


(where ¢ has been used to denote the argument of F' that becomes the value 
of f). As in 4.4, it suffices to consider the case m = 1, since if m > 2, then 
we can use Lemma 4.5 to replace the vector (y1,...,Ym) by a single y, and if 
m = 0, then we can introduce a “dummy argument” y on which F' does not 
actually depend. 
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Thus, let m = 1. We introduce a function G of the arguments 71,...,%n,7; Y; 
t,t, by setting s(1) = 2,s(a) = 1 for x > 2, and 


B= F (aa p< < tak Gdlh,d),Gd(h, ts), k>1, 
y-1 
G=F(a1,...,2n,7,1,y) J] s(Gd(k,t)) - Fe. 
k=1 


Here een = 1 by definition. It is easy to see that G is primitive recursive, since 
it is obtained by recursion on y from two other functions that are obviously 
primitive recursive. We shall show that I, is the projection of the 1-level of G 
onto the coordinates (21,...,%n,‘)- 

The inclusion pr(G = 1) C Tg. Let (@,...,%n,7,y,t,t1) be a point in the 
1-level of G. We must verify that (a1,...,2n) € D(g) and that y = g(a1,...,2n). 
In other words, we must show that 


f(@1,-2+,2n;7) = 1; 
f(a1,-.-,2n,k) is defined and > 1 for alk < 7-1. 


Since G = 1 at the given point, it follows that all the factors in G equal 1 there. 
In particular, F(a1,...,2%n,7,1,y) = 1, which implies that f(a1,...,a@n,7) = 1, 
because If is the projection of the 1-level of F’. If y = 1, there is nothing more 
to be proved. 

Suppose y > 1. Since the Ath factor in the product |i eae equals 1, we obtain 


s(Gd(k, t)) = 1 > Ga(k,t) > 2, 


as required. 

The inclusion Tg C pr(G = 1). Let (x1,...,¢n,7) € Tg. We must choose 
values for the remaining coordinates y,t, and t; in such a way as to make all 
the factors in G equal to 1. 

First of all, (v1,...,2n,7,1) € Ty by the definition of g. We find the 
necessary value of y by lifting this point from I’y to the 1-level of F’. If y = 1, 
we may choose arbitrary values of t and fy. 

Suppose y > 1. We then find ¢t from the system of equations 


Gd(k,t) = f(a1,...,2n,k), forall<k<y-l. 


(Here the right side exists by the definition of D(q).) 
Finally, for each k < y— 1 we lift the point 


(€1,.--,@n,k, Gd(k, t)) eT 


to a point on F = 1 having additional coordinate y™), and then we find t, from 
the system of equations 


Gd(k, t1) = y™, i<keq—t1. 
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This makes all the factors in i equal to 1. In fact, s(Gd(k,t)) = 1, since 
Gd(k, t) = f(a1,...,@n,k) > 2 for k < y—1, and, finally, F, = F(a1,...,2n,k, 
Gd(k, t), Gd(k, t1)) = 1 by the definition of t and fy. 


4.11. Stability relative to recursion. We now carry out the last step in the proof 
of Theorem 4.3. 

Let f and g be partial functions of n and n+ 2 variables, respectively, and 
let A be the function of n + 1 variables that is obtained from f and g using 
recursion: 


A(@ip cea pny 1) = Flas, 2258), 
h(a1,..-,2n,k +1) = O(Fijase stig tii, gs h)). 
We must show that if [ and I’, are primitive enumerable, then so is Ip. 


Let F and G be primitive recursive functions whose 1-levels project onto I's 
and I’,, respectively: 


b= f(ai,..-,%n) & Ay, F(21,...,2n,¢,y) =1, 
Y = 9(21,-+-,En42) 4> 42, G(a1,.-.;%n42,%)2) = 1, 


where, as in 4.10, it suffices to consider the case in which the projection 
codimension is 1. 

We shall explicitly construct a function H whose 1-level projects onto I). 
H will be a function of the arguments 21,...,2%n41,,Y,t,t1 (where 7 is the 
argument that becomes the value of h). We set 


5(1) =1, 5(a) = 2, for x > 2; 
Gi = G(a1,...,&n,k —1,Gd(k — 1,t), Gd(k, t), Gd(k, t1)); 


Tn+1 


H = F(a1,...,2n,Gd(1,),y) - al(n — Gd(tn41,t))” + 1| likes 
k=2 


(We take [],"5' = 1ifan41 = 1.) As in 4.10, we easily verify that H is primitive 
recursive. 

The inclusion pr(H = 1) CT a. Let (21,...,2n41,7,y,t,t1) be a point on 
H = 1. We must show that h(21,...,2n+41) = 7. Since the second factor in H 
equals 1, we first obtain 7 = Gd(a,41,t). If we also have 7,41 = 1, then setting 
the first factor in H equal to 1 gives 


n = Gd(1,t) = f(a,...,¢n) = R(a1,..-,2n,1). 


Now suppose #41 > 1. In this case, using the equation G; = 1 we find that 
foral2<k < an4i, 


Gd(k, t) = g(x1,...,2n,k — 1,Gd(k — 1,1), 
and using the equation F = 1 and the definition of h we find that 
Gd(1,t) = f(a,...,%n) = h(a1,...,0n,1). 
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If we increase k from k = 1 to k = x,4 1 and use the recursive definition of h, 
we see by induction on k that Gd(k,t) = h(a1,...,@n,k) and, in particular, 


n= Gd(an41, t) = h(a, see Dn, Gait): 
The inclusion Tp, C pr(H = 1). We are given a point 
(x4, re eee h(a, Scand ,Un41)) ET . 


We let 7 = h(a1,.-.-,;%n41). We must also choose values of y,t, and t; so as to 
make H equal to 1. 

If @p41 = 1, we choose ¢ such that Gd(1,t) = h(a,...,%n,1) = f(@1,..-,@n). 
We then lift the point (z1,...,v,,Gd(1,t)) € Ty to a point on F = 1. This 
gives us the value of y; t; may be chosen arbitrarily. 

Now let #41 > 1. We first find t from the system of equations 


Gd(1,t) = f(a1,...,2n) = A(a1,...,2n, 1); 
Gd(k,t) =h(a1,...,@n,k) = O(Pija.<y tng Rh 1, Gd(k — b)) 
x 2 < k < Ent: 


We then find y by lifting the point (x1,...,a@,,Gd(1,t)) € Ty to the 1-level 
of F’. This makes the first two factors in H equal to 1. 
We next lift the points 
(@1,---,%n,k —1,Gd(k — 1,t), Gd(k,t)) € Ty, 2<k<an41, 


to the 1-level of G by adding coordinates z‘), and then solve the following 
system of equations for t,: 


Gd(k, t1) = z™, 92 be tata. 


This makes the G;, factors in H equal to 1. 
The proof of Theorem 4.3 is complete. 


4.12. Explanation of the term “recursively enumerable set.” Theorem 4.3 shows 
that if FE is recursively enumerable, then there exists a program that “generates” 
E (see 4.1). In fact, suppose F is the projection onto the first n coordinates of 


the 1-level of the primitive recursive function f(1,...,%n,y). The program that 
generates E must run through the vectors (a1,...,2%n,y), say in Cantor order, 
compute f at each vector, and give (71,...,@n) as output if and only if f equals 


1 (compare with Corollary 4.18 below). Unlike programs of the type described 
in $1, which can become stuck forever on an element not in EF, a generating 
program sooner or later gives us any given element of F, and nothing other 
than such elements. However, if E is empty, we might never find this out. 

We conclude this section by discussing the properties of the so-called decid- 
able sets. Intuitively, E c (Z*)" is decidable if there exists a program that for 
every element of (Z*)” tells whether it belongs to E. 
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4.13. Definition. A set E Cc (Z*)" is called decidable if both it and its 
complement are recursively enumerable. 


In §5 and in the next chapter we show that there exist sets that are 
recursively enumerable but not decidable. This result is closely connected with 
Godel’s incompleteness theorem, which is the subject of Chapter VII. 


4.14. Theorem. The following three classes of sets coincide: 


(a) sets whose characteristic function is recursive; 

(b) level sets of general recursive (t.e., everywhere defined partial recursive) 
functions; 

(c) decidable sets. 


Proor. The relations (a) = (b) and (b) C (c) are obvious from what has 
already been proved. It thus remains to show that (c) C (a). 

Let E Cc (Z*)” be a decidable set, and let E’ be its complement. By defini- 
tion, E = D(f) and E’ = D(f’) for certain partial recursive functions f and f". 
We may even assume that f = 1 and fa = 2 (where they are defined). We con- 
sider Ty UT» C (Z*)" x Z*. This union is obviously the graph Ty of the 
characteristic function g of the set EF. It is clear from the proof of Lemma 4.8 
that I, is recursively enumerable whenever I’y and I’ f’ are. Hence, the partial 
recursiveness of g is implied by the following result, which is also of independent 
interest. 


4.15. Proposition. In order for a partial function g from (Z*)" to Zt to be 
partial recursive, it is necessary and sufficient that its graph T, be recursively 
enumerable. 


ProoF. Necessity has already been proved. 

We verify sufficiency. Since I, is recursively enumerable, there exists a 
primitive recursive function G(@1,...,¢n, 7,2) (see 4.10) such that Ty = the 
projection of the 1-level of G onto (a1,...,2n,7). We set 


DB 0 ay Bias) SG Digs Bn, t)(u), tP(u)), 


where u +> com (u), t?)(u)) is the primitive recursive isomorphism Zt — (Z*)? 
described in 4.5 and 4.7. H is obviously primitive recursive. Finally, we set 


h(@1,..-,2n) = main { | A (py a, ttn Wh) = 1}. 


This is a partial recursive function whose domain of definition coincides with 
D(g) and that easily allows us to compute g: 


g(@1,-.-,;2n) = t?) (h(a, sats ne 


Thus, g is partial recursive, and the proof of Proposition 4.15 and Theorem 4.14 
is complete. 
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4.16. Corollary. Every partial recursive function g has a description in which 
the p-operator is applied only once. 


4.17. Corollary. Every partial recursive function g that is everywhere defined 


has a description g1,...,gn = g in which all the functions g; are everywhere 
defined. 


In fact, the description whose last part (starting with G') was constructed 
in 4.15 has this property. 


4.18. Corollary. The class of nonempty recursively enumerable sets coincides 
with the class of sets of values of primitive recursive functions. 

In fact, the set of values of a function f is a projection of the graph of f. 
Conversely, let E Cc (Z*)” be a nonempty enumerable set that is the projec- 
tion onto the (a1,...,2,)-space of the 1-level of a primitive recursive function 
f(a1,.--,2n,y). Let (e1,...,@n) be an arbitrary member of E. Then EF coincides 
with the set of values of the primitive recursive function 


g(2) = takes seston (2), if FPP @),.. 8 P@), AP @) = 
(e1, 


wegenys if not. 


4.19. Corollary. 

(a) Finite sets and their complements in (Z*)" are decidable. 

(b) Every partial function from (Z*)™ to (Zt)" with a finite domain of 
definition is recursive and computable. 


In fact, the one-point set {a} C Z* is a level for a suitable sum of two step 
functions, and its complement is a level for another such sum. Decidability is 
preserved under finite union and intersection, so we have (a) for n = 1. Then 
the isomorphism 7‘) allows us to infer this result for all n. 

This also implies (b), since the graphs of the mappings in (b) are finite, and 
therefore enumerable. 


5 Elements of Recursive Geometry 


5.1. Let E c (Zt)™ be an enumerable set. We consider the structure on EL 
given by the following data: 


(a) €E={E |E CE,E is enumerable}. 
(b) For every E’ € €,R(E') ={f|D(f) =E',f : EF — Z* is recursive}. 


We let R = the set of pairs (E’', R(E’)), E’ € €. 

We shall show that the structure {€,R} has much in common with the 
structure “a topological space together with a sheaf.” This allows us to find 
natural interpretations for certain well-known results about enumerable sets, 
and to ask new questions suggested by analogies with other geometrical theories. 

We begin with some simple observations. 
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5.2. € is a lattice, i.e., it is closed with respect to finite unions and intersections. 

Since € is not closed with respect to arbitrary infinite unions, we cannot 
consider € as the system of open subsets of F in some topology. Nevertheless, 
in Section 5.9 below we show that € is stable with respect to an important class 
of infinite unions. We shall say that € determines a quasitopology on E (which 
has properties similar to those of Grothendieck topologies, but does not satisfy 
all the axioms of the latter). 


5.3. Let E',E” € € and EC E”. Then the restriction of functions to E’ gives 
a mapping R(E") — R(E’): f > fly. 

In fact, let cz, € R(E’) and czy = 1 on E’. Then f|," = feg: is recursive 
whenever f and cp are. 


5.4. Let E = Ur Ex, where E’, Ex € €. Suppose that the fy are in R(Ex) 
and are compatible on intersections: 


Vi,g <M, fi 


B;NB; = fj ENE; 


Then there exists an (obviously unique) function f € R(E’) such that Vk < 
nN, f\E, = fr: ; 

We need only verify that f € R(E ), since there obviously exists a function 
f:E — Z* that is “glued together” from the f,. But the graph of f is the 
union of the finitely many enumerable graphs I, C E’ x Zt, and so is itself 
enumerable. We then use Proposition 4.15. 

The results 5.38 and 5.4 allow us to consider R as a sheaf on the 
quasitopology €. 


5.5. Let FE, and Ey be enumerable sets, and let f : Ey — Ep. be a recursive 
function. Then f induces a morphism of the corresponding quasitopologies with 
sheaves in the following sense: 


(a) If E’ C Ey is enumerable, then f~!(E’) C E, is enumerable. 
(b) For every E C E2, composition with f determines a mapping 
fy :R(E) > R(f-(E)). 
The first part follows because cp_1(z/) = Cg’ 0 f is recursive whenever Cy. 
and f are; the second part is obvious. 
One might get the impression that the pair (€,R) completely characterizes 
E independently of the embedding FE c (Z*)’™. However, this is not the case. 


5.6. Proposition. Let Ey and E, be enumerable infinite sets. Then there exists 
a bijection f : E, E> such that f and f~' are (partial) recursive. f induces 
an isomorphism (E,,R1)—>(E2, Ra). 


PROOF. We establish the following more precise facts: 


(a) If E C Zr is infinite and decidable, then there exists a general recursive 
bijection f : Z*—>E for which f~+ is (partial) recursive and is an increasing 
function. The converse is also true. 
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(b) If E C Z* is infinite and enumerable, then there exists a general recursive 
bijection f : Z*—>E with f~+ (partial) recursive. 


First suppose F is decidable, and let g(a) = 2 for « € E, g(x) = 1 for 
x ¢ BE, and h= cp. We set 


2 
y 
f(z) =min<¢ y (> g(a) —y — :) +1=1) =the zth element of FE. 
tai 


It is easy to see that 


is equal to the index of x 


f(s) = (>: gly) — : h(x) 4 as an element of F, if x € E; 
v=! is not defined, otherwise. 


Now suppose F is enumerable. By Corollary 4.18, there exists a primitive 
recursive function g : Zt — E whose image coincides with FE. We shall adjust 
g so that it becomes bijective. We set 


F={keEZtMi <k,g(i) Z g(h)}. 


This set is decidable, since it is the 1-level of the following primitive recursive 
function h: 


k-1 
AQ)=1;  — h(k) = II s((9(i) — g(k))* +1), for k > 2; 
1, fora >2, 
w= fh for x= 1. 


By the previous result, there exists a recursive bijection g: Z*+ > F. Let 
f= 90g. Since g|r: F — E is a bijection, it follows that f : Z+ — E is also 
a bijection. The inverse function is partial recursive because 


f(a) = min{y|(f(y) — 2)? +1 = 1. 


The proposition is proved. 


Because of this result we usually consider the embedding E Cc (Z*)™ to 
be an essential element of the structure on E. In particular, we call E, and 
E isomorphic if there exists a bijection between them that is induced by a 
recursive bijection of the ambient spaces. 

The complete classification of enumerable sets up to isomorphism is not 
known, but many subtle results have been obtained in the theory of “reducibil- 
ities.” We shall only go so far as to show, using a theorem that will be proved 
in the next chapter, that not all enumerable sets are decidable. 
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5.7. Families. Suppose that m > 0 and B is a set. By a family of m-sets (or an 
m-family) over the base B we mean any mapping B > P((Zt)™). If Ex Cc 
(Z*)™ is the image of k € B under this mapping, we also denote this family 
by {Ex}. We call the set EF = {(x,k)|x € Ex} C (Z*)™ x B the total space of 
the family. 

Similarly, we call a mapping B — {partial functions from (Zt)™ to Zt} a 
family of m-functions over the base B. We call the function f : (7, k) f(x) 
for « € D(fx) the total function of the family. 

A family of m-sets (resp. m-functions) is said to be enumerable if B Cc (Zt)” 
for some n and if the total space is enumerable in (Z*)™ x (Zt)” (resp. the 
total function is partial recursive on (Z*)™ x (Zt)"). 

If {£;,} is enumerable, then the set {k € B|E;, is nonempty} is enumerable, 
since it is a projection of the total space E. Each of the E; is enumerable, since 
it is the intersection HM (Zt)™ x {k}. 

Similarly, if {f;} is enumerable, then the set {k € B|f; is not the nowhere 
defined function} is enumerable, since it is a projection of the domain of 
definition of the total function f. Each of the f, is partial recursive, since 
it is the restriction of f to the enumerable set D(f)(Zt)™ x {k}. 

If {f,} is an enumerable family of m-functions, then {D(f,)} is an 
enumerable family of m-sets (with total space D(f)), and {['y,} is an enu- 
merable family of (m+ 1)-sets (with total space +, or more precisely, I’, after 
a permutation of its factors). 

An enumerable family {£;,} (respectively {fx}) is said to be versal if 
every enumerable m-set (resp. any partial recursive m-function) is among the 
elements of the family. (The word “versal” is borrowed from algebraic geom- 
etry, after removing the prefix “uni” which would indicate that each term in 
the family could occur only once.) In 88 of the next chapter we show that 
versal families exist for each m. This is one of the central results of the 
theory, since total spaces and total functions of versal families are the starting 
point 


for practically all investigations of undecidability. Here we limit ourselves to the 
simplest and most fundamental application: 


5.8. Theorem. Let {Ex} be a versal family of 1-sets over the base BC Zt. 
Then the set 


F = {klk € Ex} 


is enumerable, but is not decidable. 


Proor. Let E c Zt x Zt be the total space of the family. Then F = the 
projection of EM (diagonal in Zt x Zt) onto the first factor, and therefore is 
enumerable. 

On the other hand, for every k € B we have F = Z*\F # Ex, since Kk 
belongs to either F or E,, but not to both. Since {Ey} is a versal family, F 
cannot be enumerable. The theorem is proved. 
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We now show how to use enumerable families to strengthen the results in 
5.2 and 5.4. We return to the notation at the beginning of the section. 


5.9. € is closed with respect to taking the union of the elements of any enumer- 
able family of subsets of E. 

In fact, suppose that {E} is such a family and E’ is its total space, where 
Ec (Zt+)™ x (Zt+)”. Then 


U E,, = the projection of E’ on (Z*)™. 
keB 


5.10. Suppose that {f,} is an enumerable family of partial functions on E, 
E, = D(fx), BE = Une E,,, and 


Then there exists a unique function f € R(E) that is glued together from 
the tre 

In fact, the graph I’ is enumerable, since it is the union of the enumerable 
family of enumerable sets If, . 


5.11. After these remarks it is natural to consider the following system of ideas 
by way of analogy with the theory of spaces with sheaves. 


(a) Let EF = Upep Ex be a covering of an enumerable set by an enumerable 
family. Then for any n > 1 the family {E,,---7 Ex, |(ki,..-,kn) € B”)} 
is also an enumerable covering of E. In fact, let E’ = the total space of 
{E;,} C E x B, and let 


E” = {Gigs cas Eng Ripe vskn) [Li € Eg, t= serene ci 
~ E x---x E(n times). 
Then the total space of the family {E,, M --- M Ex,,} is isomorphic to 
(diagonal in E”) x B?N E'”. 
(b) Using the same notation, we define the “recursive product” RI, C 


Th(k,,..., kn EB (Ek, 1--> A Ex,,) as follows: RIp = R(£), RII, = the set 
of enumerable families {f(%,,...,%,,)} over B” such that 


reer 


i € R(Ex, N---N Ex,,), for n> 1. 


(c) For every n > 0 we have the following boundary mappings: 


or: RIT RIT, Q=1,2..,n+1: 
(OPO Feisuie Bay) cada 


= Fits, veep lina, Giga, tn41)| Bi, NEL, 4° 


(Note that we really do not have 0?(RII,) C RMn+1.) 
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It is possible to associate various types of “recursive Cech cohomology 
groups” of the covering U,¢5 Ex to the object 


ot 
0 => 
R][=R() RI] [RIL 
0 os = 


It would be interesting to study such cohomology groups. The result 5.10 shows 
that this complex is “exact at the first term.” 

The reader should not find it hard to imagine what other geometrical con- 
cepts would look like in this context. In particular, it would be worthwhile to 
study the quotients of enumerable sets by enumerable equivalence relations. 
Higman’s theorem (see Chapter VIII) gives a characterization of groups in the 
category of such objects. 

We conclude by giving several results on the structure of €. Because 
of Proposition 5.6, we need only consider subsets of Zt; that is, we take 
€={E|E CZ*,E’ enumerable}. 


5.12. Proposition. There exist enumerable subsets F C Z* having an infinite 
complement such that for any infinite E € E we have FONE 4 ©, so that FOE 
is infinite. 


Such F' are called simple. From a topological point of view they resemble 
dense open sets. 

Proor. Let {Ex} be a versal family of l-sets over Z* with total space 
EcZ+ x Zt. We set E = EN {(x,k)|2 > 2k}. Since E’ is enumerable, there 
exists a primitive recursive function with image E: 


g= (91,92): Zt +E. 


Let h(k) = min{z|g2(z) = k}, let f(k) = gi(h(k)), and let F denote the set of 
values of f. F' has an infinite complement, since f(k) > 2k. The intersection of 
F with an infinite E;, is nonempty, since any value of gi(z) when go(z) = k lies 
in E, NE = @. The proposition is proved. 


5.13. Proposition. 


(a) The quotient lattice E/(finite sets) has nontrivial maximal elements. 

(b) Every nonsimple enumerable set with an infinite complement is contained 
in such a maximal element. 

(c) There exist simple enumerable sets with an infinite complement that are not 
contained in any nontrivial maximal set. 


We refer the reader to Rogers’ book for the proof of these and many other 
results. 


VI 


Diophantine Sets and Algorithmic 
Undecidability 


1 The Basic Result 


1.1. In 84 of Chapter V we showed that enumerable sets are the same thing as 
projections of level sets of primitive recursive functions. The projections of the 
level sets of a special kind of primitive recursive function—polynomials with 
coefficients in Zt—are called Diophantine sets. We note that this class does 
not become any larger if we allow the coefficients in the polynomial to lie in Z. 
The basic purpose of this chapter is to prove the following deep result: 


1.2. Theorem (M. Davis, H. Putnam, J. Robinson, Yu. Matiyasevié). All enu- 
merable sets are Diophantine. 

The plan of proof is described in §2. §§3—7 contain the intricate yet com- 
pletely elementary constructions that make up the proof itself; these sections are 
not essential for understanding the subsequent material, and may be omitted if 
the reader so desires. 

In 88 we use Theorem 1.2 to prove the existence of versal families of enumer- 
able sets and functions. Recall that in §5 of Chapter V this result was shown 
to imply that enumerable sets exist that are undecidable, a fact we shall use in 
Section 1.3 below. 

In $7, which stands somewhat apart from the rest of the chapter, we define 
the Kolmogorov complexity of recursive functions, establish the basic properties 
of this concept, and prove that the problem of computing the complexity is 
algorithmically undecidable. 

In Chapter VII the following corollary of Theorem 1.2 will be used in an 
essential way: enumerable sets are definable in L,Ar. In fact, by their very 
definition, Diophantine sets are defined by formulas of the form 321 ---4az,(p), 
where p is an atomic formula. 


In the remainder of this section we describe the principal applications of 
Theorem 1.2: settling Hilbert’s tenth problem, constructing polynomials that 
take only and all prime number values in Z*, and so on. 
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1.3. Hilbert’s tenth problem. Hilbert stated it as follows: 


Suppose we are given a Diophantine equation with an arbitrary number 
of unknowns and with rational integer coefficients. Give a way in which 
it is possible to determine after a finite number of operations whether 
this equation is solvable in rational integers. 


We show that the combination of Theorem 1.2, Theorem 5.8 of Chapter V 
(which follows from Theorem 1.2), and Church’s thesis implies that this problem 
is undecidable. 

First of all, any natural number is the sum of four integer squares (Lagrange). 
Hence f(r1,...,2n) = 0 is solvable in (Z*)" if and only if the equation 
f+ UL, y2,,---,1+ DX, y2,) = 0 is solvable in (Z)*”. Consequently, it is 
sufficient to show that the mass problem “determining whether there are 
solutions in (Zt)” (see Section 2.6 of Chapter V) is algorithmically undecidable. 

Let E Cc Z* be an enumerable set that is not decidable. We represent E 
as the projection onto the t-coordinate of the 0-level of the polynomial f, = 
f(t;v1,---,;%n), where f € Z[t,21,..., 2%]. The equation f;, = 0,to € Z*, has 
a solution if and only if to € E. By the discussion in §2 of Chapter V, the 
corresponding mass problem for the family {f;} is algorithmically decidable if 
and only if the characteristic function of E is computable. But by our choice of 
E, this characteristic function is only semicomputable. 

Thus, solvability in integers cannot be determined algorithmically even for 
a suitable one-parameter family of equations. The number of unknowns in the 
equation, and, in general, the codimension of the projection in Theorem 1.2, can 
be reduced to 13 (Matiyasevié, Robinson). The precise minimum is not known, 
although it is an interesting problem. 

Finally, it should be noted that the construction of a Diophantine represen- 
tation for any enumerable set FE is completely effective in the sense that given 
a recursive description of f with D(f) = E or of g with g(Z*) = E, we can 
write out the corresponding polynomial explicitly. The same holds for the con- 
struction of versal families, of an enumerable undecidable set, and so on. These 
are all constructive assertions, and not simple existence theorems. 


1.4. Polynomials that represent the prime numbers. The search for “explicit 
formulas” for prime numbers was a traditional occupation of dedicated number 
theory enthusiasts for many centuries. Euler found the polynomial 2? + x + 41, 
which takes a long series of only prime values. But it has long been known that 
the set of values at integer points of a polynomial f in Z[a1,...,2,] cannot 
consist entirely of prime numbers: for example, if p and q are two sufficiently 
large primes, then the congruence f = 0 mod pg can be solved (in infinitely 
many ways). On the other hand, the problem becomes solvable in the class of 
primitive recursive functions: the function {i +> the ith prime} is itself primitive 
recursive (see §1 of Chapter VII), but for trivial reasons. 

The nontrivial statement of the problem and the problem’s solution involve 
Theorem 1.2: the set of prime numbers is the set of all positive values at points 
in (Z*)" of a certain polynomial in Z[x1,..., 2%] (or, if we prefer, m may be 
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replaced by 4n; see the reduction step in 1.3). Matiyasevié showed that there is 
a suitable polynomial of degree 37 in 24 variables. 

This is actually a general result that has nothing to do with the specific 
properties of prime numbers: 


1.5. Proposition. Let E C Z* be a Diophantine set. Then there exists a poly- 
nomial g € Z{xo,...,@n] such that E coincides with the set of positive values of 


g at points in (Zt)"*1. 


ProoF. Let E£ be the projection of the 0-level of the polynomial f(ao,71,...,2%n) 
onto the x9-coordinate. We set 


g = x[1- f?(x0,21,---,%n)]- 


Clearly, the positive values of g are precisely the elements of E. 


It remains only to use the fact that the set of prime numbers is decidable, 
and hence Diophantine by Theorem 1.2. 
The following sets are also sets of positive integer values of polynomials: 


n 


1.6. The sequences {1,10,100,...,10*,...} and {1,2?,33)...,n”"(n times), ... }. 
It is amazing that the values of the corresponding polynomials can drop to 
zero and below in neighborhoods of points where these values are so large. 


1.7. The Fermat set {n|n > 2 and a” +y"+ 2" =0 is solvable in Z}. Thus, the 
variable n can be moved from the exponent to the coefficients of a Diophantine 
equation. 


1.8. The set {10¢1, 107e2,...,10"e,,...}, where e; is the ith digit after the deci- 
mal point in the decimal expansion of e (or 7 or V2, or any other “computable” 
irrational number). 


1.9. The set of all partial fractions in the continued fraction expansion of e, or 
m, or 2. 

We recall that in the case of ¥/2 it is not known whether this set is finite or 
infinite. 

These examples show that many number-theoretic questions reduce to prob- 
lems of the solvability of Diophantine equations. In Chapter VII we shall 
see that in a certain sense, “almost all of mathematics” reduces to such 
problems. 


2 Plan of Proof 


2.1. In this section we introduce some auxiliary notions and give the plan of 
proof for Theorem 1.2. 

We shall temporarily introduce a class of sets that are intermediate 
between enumerable and Diophantine sets. In order to define this class, we 
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consider the map that to every subset E C (Z*)” associates the set F C (Zt)"” 
that is given by the following rule: 


(21,---,%n) € PF SVK E [1, py], (U1,...,2n-1, k) € E. 


We shall say that F’ is obtained from FE by applying the bounded universal 
quantifier to the nth coordinate. We define similarly the operation of applying 
the bounded universal quantifier to any coordinate. 


2.2. Definition-Lemma. Consider the following three classes of subsets of 
(Z*)” for each n. 


(1) Projections of level sets of primitive recursive functions. 

(II) The least class of sets that contains the level sets of polynomials with 
integer coefficients and that is closed with respect to taking finite direct 
products, finite unions, finite intersections, projections, and applying the 
bounded universal quantifier. 

(III) Projections of level sets of polynomials with integer coefficients. 


The following assertions hold for these classes: 


(a) The class (1) coincides with the class of enumerable sets, and the class (III) 
coincides with the class of Diophantine sets. We shall call sets in the class 
(II) D-sets. 

(b) (I) D (II) 3D (III). 


PROOF. 


(a) In Theorem 4.3 of Chapter V we showed that the class of primitive 
enumerable sets coincides with the class of enumerable sets. The rest of (a) 
merely consists of definitions. 

(b) Only the inclusion (II) C (I) is not completely obvious. First of all, the 
m-level set of a polynomial f is the same as the 1-level set of the primitive 
recursive function (f —m)?+1. Hence, to verify (II) ¢ (I) it suffices to show that 
the class (I) is closed with respect to (finite) direct product, union, intersection, 
and the bounded universal quantifier. All except for the last of these were 
established in Lemma 4.8 of Chapter V. 

Finally, suppose F' is the image of a primitive enumerable set E under the 
bounded universal quantifier: 


(@1,---;Un-1,In) CF SVE San, (21,..-,%n-1,k) € E. 


Starting with the function f(#1,...,@n—1, Un} Y1,---;Ym) whose 1-level projects 
onto E, we want to construct a function g whose 1-level projects onto F. 
A natural idea is to consider as an approximation to g the product 


II f(a1,.- 24 fn REY ky < -;Ymk); 


where the y;, are “independent variables.” The only problem is that the number 
of arguments of this “function” increases with x,. To deal with this, we apply 
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the Gédel function Gd(k,t), which was defined in Section 4.9 of Chapter V. 
The function g will now depend on 21,...,%, and on m additional arguments 
ti, ereaiie wba 


=] Fei seep tyay CAR) nin Gd(hyt)). 


k=1 


This function is primitive recursive, because the kth factor is obtained from f 
and Gd by substitution and identifying arguments, and then g is constructed 
from such factors by recursion. 

We now verify that the set F’ is the projection of the 1-level of g onto the 
(a1,-.-,2n)-coordinates. In fact, if g(a,...,tm) = 1, then for all <k < x, 
we have f(x1,...,%n—1,k, Gd(k, t1),...,Gd(k,tm)) =1,ie., forall <<k <a, 
the point (@1,...,%n-1,k) belongs to E. This means that (a1,...,an) € F. 

Conversely, if (a1,...,@n) € F, then for 1 < k < a, we can lift the point 
(21,-+-;2n—1,k) to the 1-level of f. Let the y-coordinates of the resulting point 
be Y1,k;---,Ym,k- We solve the following system of equations for the t;: 


Gd(k, ti) = yin, forall <k < ay. 


This is possible by the fundamental property of Gd. The resulting values for 
the t;, along with 71,...,2,, make g equal to one. This completes the proof of 
Lemma 2.2. 


2.3. The plan for the rest of the proof of Theorem 1.2 is as follows. In §3 we 
show that the classes (I) and (II) coincide, and in §§4—7 we show that (II) and 
(III) coincide. 


2.4. Remark. In the course of proving Lemma 2.2, we obtained the following 
facts, which should always be kept in mind in what follows: 


(a) In the definitions of the classes (I)—(III) we may always replace “level sets” 
by “1-level sets” (by going from f to (f —m)” +1). 

(b) All of the classes (I)—(III) are closed with respect to (finite) products, in- 
tersections, unions, and also projections. (The proof of this for the class (1) 
in Lemma 4.8 of Chapter V is also applicable to the class (III).) 


We encounter much greater difficulty in treating the bounded universal 
quantifier. Indeed, the most technical part of the proof in §§4—7 is concerned 
with showing that the class of Diophantine sets is closed with respect to the 
bounded universal quantifier. 


3 Enumerable Sets Are D-Sets 


Let f :(Z*)" — Z* be a primitive recursive function. Its 1-level can be repre- 
sented as the projection onto the first n coordinates of the set Tp [(Z*)” x {1}], 
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where I'y is the graph of f. Thus, an enumerable set can be obtained as a 
projection of the intersection of the graphs of two primitive recursive functions. 
Since, by definition, the class of D-sets is closed with respect to projections and 
intersections, the assertion in the title of this section follows from the following 
fact: 


3.1. Proposition. The graphs of primitive recursive functions are D-sets. 
Proor. The graphs of the basic functions are Diophantine. The stability 
of the property of graphs “being D-sets” relative to the composition and jux- 
taposition of functions is verified by the same arguments as in the proof of 
Lemma 4.8 of Chapter V. It remains to prove the stability under recursion. 
We shall first of all need information about the graph of Gédel’s function. Here 
it is more convenient to use gd instead of Gd. 


3.2. Lemma. The graph of the Godel function gd(u,k,t) = rem(1 + kt, w) is 
Diophantine, and a fortiori, a D-set. 


PROOF. The set 


Tea = {(u,k, t, y)|y is the remainder when w is divided by 1 + kt} 
is the intersection of the following two sets in age 

Ey: y <1 + kt; 

Ey:u-—y 20 and is divisible by 1 + kt. 
Both FE; and E», are Diophantine. In fact, EF, is a projection of the 0-level of 
the polynomial 2 + kt — y — yi, and EF is a projection of the 0-level of the 
polynomial u— y — (1 + kt)(y2 — 1). The lemma is proved. 


3.3. Corollary. Let f and g be functions of n andn+2 arguments, respectively, 
whose graphs are D-sets. Then the following equations determine D-sets in the 
(@1,---,;Un41,U,t,...)-coordinate space (where any additional coordinates may 
follow the t): 


E: gd(u,1,t) = f(a,...,2n); 

FP: gd(u, Ln+1 + 1,t) = g(@1, hs 5 En41,8d(u, 2n41,t)). 
PROOF. Introducing extra coordinates after the t amounts to taking the direct 
product with (Z*)?, and this, of course, takes D-sets to D-sets. 

FE can be represented as a projection of the intersection of the sets 
ed(u,k,t) = w, f(a1,..-,%n) = w, and k = 1 (where k and w are auxiliary 
coordinates). Since gq and I’ are D-sets, the same is true for E. 

Similarly, F can be represented as a projection of the intersection of the sets 

gd(u, tn41 + 1, t) = U1; 
gd(u, Un+1; t) = W2, 


g(@1,---,;Ln41,W2) = wi. 


These are D-sets, because Ig and gq are D-sets. 
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3.4. PROOF OF PROPOSITION 3.1. Recall that it remains to verify the following 
assertion: Let h be the function defined recursively from functions f and g by 
the equations 


Ri 022985 l) = f (rye =5 8n)s 
A(ai,..-,2n,k +1) = 9(a1,...,2n,k, h(a1,...,2n,k)); 


then the graph T;,, 
Pi dong tea) ELS y= hi, 15 Pek), 


is a D-set whenever the graphs I’r and I’, are D-sets. 
First step. We set T;, = T'UT?, where 2,4; =1 onT! and 2,41 > 2 on I?. 
Since 


(Pijss.p teat) El e449 = 1 and n= f(x1,.--,2n); 


it follows that I! is the intersection of tx Zt and a D-set, and therefore is a 
D-set. It remains to verify that T’? is also a D-set. 

Second step. In the (21,...,%n+1,7, u,t)-coordinate space we consider the 
sets 


Ey: = gd(u, tn41,¢t), 

E 2: gd(u,1,t) = f(a,...,2n), 

E3:%n41>1, gd(u,k,t) = g(a1,...,Un,k — 1, gd(u, k — 1,¢)) 
for all2 <k < api. 


It is easy to see that [? = pr 3_, Ej. In fact, as in §4 of Chapter V, we obtain 
inclusion in one direction by comparing Fz and E3 with the inductive definition 
of h, and in the other direction by suitably choosing the parameters u and t in 
Godel’s function. Thus, it remains to show that the E; are D-sets. 

Third step. E, is the graph of gd with some additional coordinates. Ey was 
shown to be a D-set in the proof of Corollary 3.3. 

Finally, E3 is “almost” obtained from the set F’ in Corollary 3.3 by applying 
the bounded universal quantifier to the x,+1-coordinate. More precisely (for 
brevity, we ignore the 7-coordinate); 


(@1,---,;En41,u,t) € Es @ Vk € [2,¢n41], (1,---,2n,k—1,u,t,) eF 
© Vk € [1,¢n41 — 1), (@1,..-,%n,k,u,t,) € F. 


Consequently, if we apply to F' the bounded universal quantifier in the x,,41- 
coordinate, we obtain a D-set that is the same as E3 with the x,,41-coordinates 
of all its points decreased by 1. So it remains to see that the operation of shifting 
back by 1 preserves the property of “being a D-set,” and this follows easily from 
the definitions. The proof is complete. 
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4 The Reduction 


4.1. The next three sections are devoted to proving that the class of D-sets 
coincides with the class of Diophantine sets. As noted at the end of §2, it 
suffices to show that the class of Diophantine sets is closed with respect to the 
bounded universal quantifier. 

Let f(a1,.-.,2n,k,Y1,-+-;Ym) be any nonconstant polynomial with integer 
coefficients. f will be fixed for the duration of this section. Let d be the degree 
of f, and let c be the sum of the absolute values of its coefficients. 

We define the set E by the condition 


(v1,. svgtayy) eEkSvk < yA(y,-- »3Um)s 
f(@1,---,2nk, yi,---,Ym) = 0. 
We want to show that EF is Diophantine. In this section we prove the following 


reduction step, which is due to Davis, Putnam, and Robinson. 


4.2. Proposition. FE is Diophantine if the following three sets are Diophantine: 


= 23, 
Ly = Xo 3 
r= £9!; 
TT r3/L4 a SS 
oe! ’ U3 2 U4U5, 
v2 x5 


where (i) =n(n—1)---(n—k+1)/k! is the “binomial coefficient.” 


The proof of this and all subsequent propositions of this type follows a 
standard pattern. To show that E is Diophantine, we introduce auxiliary sets 
FE, with the following properties: 


N 
(a) B=() Ei; 
i=1 
(b) the £; are Diophantine. 


But usually we are not able to establish directly that all the E; are Diophantine, 
so we apply the same procedure to certain of the E;. Thus, the proof that E is 
Diophantine has a treelike pattern. 

The exposition of each step will consist of the following stages: the 
introduction of auxiliary variables, which disappear when we project; explicit 
construction of the sets E;; the proof of the inclusion E Cc pro, Ej; and the 
proof of the inclusion E > pr ni, Ej. 


4.3. PROOF OF PROPOSITION 4.2. We denote the auxiliary variables by the 
symbols Y, N, K, Y,,..., Ym. We introduce the sets FE; in the 
(€1,-+-,2n,y,Y,N, K,Yi,...,Ym)-space by the following relations: 


Ey: N >e-(a1---anyY)4, YY jipscinn VS Yop 
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(intuitively speaking, the right side of the first inequality gives a rough estimate 
for the value of the polynomial f at the point (21,...,2n,Y,Y1,---;Ym) if all 


yi SY). . 


By: 1+KN!= |[[Q+kN)) 
k=1 


(this is a “large modulus”; f = 0 will be replaced by divisibility by this 
modulus). 


E3: f(a1,---,%n, K,Y,...,¥m) =0 mod(1+ KN!}); 


B34; : |] (Yj —j) =0 mod(L+KN!), i=1,...,m. 
j<Y 


We define the set E’ as N13 E;. 


PROOF OF THE INCLUSION E C pr E.. Given a point (21,...,2n,y) € E, we 
must choose values for the other coordinates so that the relations Fy,..., Em+.3 
are fulfilled. 
By the definition of F, each point (1,...,@%n,k),k < y, can be lifted to the 
0-level of f: 
f(a1,--+,2n;k, Yik;---;Y¥mk) = 0. 


For Y we take the maximum of y and the y;,. Then, as before, we find the Y; 
and N by solving the system of Gédel equations 


ed(¥;,k,N!) =yir, forall <k<y. 


The proof of Gédel’s lemma shows that the Y; and N may be taken arbitrarily 
large, in particular, so as to satisfy E,. The number K is uniquely determined 
by Ey. 

All the choices have now been made. The relation /3,; holds because by 
the definition of Y; and gd, we can find a number Y; — 7 with 7 < Y, namely 
Jj = Vix, Such that Y; —j = 0 mod(1+kN!3), for every k < y. Hence, the product 
on the left in £34; is divisible by all the 1+ 4N!,1<k < y, which are pairwise 
relatively prime, since N > y by E;,. Therefore, this product is divisible by 
1+ KN!. 

Finally, to verify E3 we note that Ey implies the congruence kK = k mod 
(1+ kN!),1<k <y, because (1+ KN!) — (1+kN!) = O0mod(1+kN!). But 
then, since yj, = Y; mod(1+ kN!) by our choice of Y;, we find that 


f(a1,---,9n, K,N,...,¥m) = f(ai,.--, Ln, k, Yak, ---,Ymk) 
=0 mod(1+kN!). 


Since the moduli 1 + kN! are pairwise relatively prime, this congruence 
implies E3. 


PROOF OF THE INCLUSION pr ECE. Givena point 


(@1,---,)2n,y,Y,N,K,Vi,..-,¥m) 
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whose coordinates satisfy the relations E),...,Em4+3, we must find a vector 
(Yiks+++;Ymk) for each k < y such that 


F (Bilge o5 Cas By Yikes Um) =0. 
To do this we let py, denote any prime divisor of 1+kN!, and we set 
Yik = the remainder when Y; is divided by px. 


We claim that these y;, give us the required equality. In fact, E3 implies that 
f(a1,---,0n,k, Yik;--->Ymk) = 0 mod px. It suffices to show that the number 
on the left is less than p,. We have 


pr divides | (¥i — 3) by Bayi 
G<Y 
=> pr divides Y; — j for some 7 < Y 
=> yi~x = the remainder when Y; is divided by py < Y 


=> f(@1,..-,2n,k, Yiky---;Ymk) < (a1 -+-anyY)4 <N < pr, 


where the second inequality in the last line follows from E,, and the third 
inequality follows because p;, divides 1+ kN!. 


CONCLUSION OF THE PROOF. It remains to show that the sets Fy,...,Em+3 
are Diophantine if the sets in Proposition 6.1 are Diophantine. In fact, if we 
trivially introduce new variables and make substitutions, we can first reduce 
the verification that all the E; are Diophantine to showing that the following 
sets are Diophantine: 


L1 = £9); 

r= II (1+ kag); 
k<axe 

Ly= ][ @-): L2 > £3. 
I< 


It then remains to notice that the second of these relations can be written in 
the form 


1 
Pe ee ae 
C25 z ; 
2 


and the third relation can be written as 


This completes the proof of Proposition 4.2. 
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5 Construction of a Special Diophantine Set 


5.1. In this section we begin the proof that the three sets in Proposition 4.2 
are Diophantine. In order that the reader may better appreciate this stage in 
the proof, we mention that the most troublesome obstacle here is the rapid 
growth of one of the coordinates in comparison to the others (for example, 
Z| = «2!). J. Robinson had the following key idea. She proved that if we 
know that any specific set in (Zt)? is Diophantine and has one coordinate that 
grows faster than any power of the other but slower than, say, «* (for example, 
exponentially), we may then conclude that all enumerable sets are Diophantine. 
After this, Matiyasevié and Cudnovskii were able to show that a certain set of 
that type (connected with Fibonacci numbers) is Diophantine. For a history of 
the question, see Matijasevic’s article “Diophantine Sets” in Uspehi Mat. Nauk, 
vol. XXVII, No. 5 (1972) (translated in Russian Math. Surveys). 

In this section we give a construction that is an improved version of 
the original construction. Its idea is based on the following observation. Let 
x? — dy? = 1 be Pell’s equation (where d € Zt is not a perfect square). Its 
solutions (a, y) € (z+)? form a semigroup with composition law 


(1 + yiVd) (22 + yovd) = 23 + y3Vd. 


This is a cyclic semigroup. That is, let (21, y1) be the solution with the least first 
coordinate. Then any other solution has the form (xp, Yn), where n € Z*, and 


In + ynVd = (a, + yiVd)”. 


We call n the number of the solution (a, Yn). 

The coordinates x, and y,, grow exponentially with n, so that the set of 
solutions of Pell’s equation, and also the projections of this set on the x- and 
y-axes, are Diophantine sets having logarithmic density. This is not yet enough: 
we still have the problem of including the solution number n among the 
coordinates of a Diophantine set. Only then can we apply Robinson’s tech- 
nique. This is what will be done below. 


5.2. Notation. We consider Pell’s equation with variable d. Its first solution 
generally varies as a function of d in an uncontrollable fashion, so that it is 
convenient to choose only those d whose first solutions have the simple special 
form (a,1),a € Z*. Obviously, then d = a? — 1. 

We shall call the equation 2? — (a? — 1)y? = 1 the a-equation. We define the 
two sequences x,,(a) and y,,(a) as the coordinates of its nth solution: 


Ln(a) + Yn(a)Va? —1= (a+ Va@=1)" ; 


For each n, a formal definition of x,(a) and y,,(a) as polynomials in a can easily 
be given by induction on n. Then the expressions (a) and y,(a) will make 
sense for alln € Z and a € C. In particular, 


£n(1) =1, Yn (1) =n; 


and all the formulas given below remain true. 
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The basic result of this section is the following: 


5.3. Proposition. The set 
E: y=yYn(a), a>1; 

in the (y,n, a)-space is Diophantine. 
The proof uses the elementary number-theoretic properties of the sequences 
,(a) and y,(a), most of which will be verified at the end of the section (see 
5.8). The idea for determining n in a Diophantine way from (y, a) is to observe 
that yn(a) = n mod(a — 1) (Lemma 5.4). This uniquely determines n as long 
as n <a-—1. To pass to the general case, we introduce an auxiliary A-equation 
with A large, and find formulas for its nth solution (using y) in which n appears 
in only a Diophantine context. 

Formally, the proof that E is Diophantine follows the pattern described in 


4.2. In addition to the basic variables y, n, a, we introduce six auxiliary variables: 
2,21,Y1, A, 2, y2. We set 


Fy: yen, a> i 
Eg: 2° — (a? —1)y? =1; 
E3: y1 = 0 mod 227y?; 
Ex: x - (a? _ 1)y? All 
Es: A=a+<a7(x? —a); 
Bg: «3 — (A? — 1)y3 = 1; 
Ez: yo —y =0 mod 27; 
Eg: yg =nmod 2y. 

Let E’ = n8_, E;. We show that pr E = E. 

The inclusion E C prE . Given (y,n,a) € E, we must find values for the 
other variables such that F,,...,/s hold. As before, we shall not introduce any 
new symbols for these values; after we choose, say, a value for x, the letter x 
will become the name for this value. 

FE, is automatically satisfied: y,(a) > n for all a > 1,n > 1 (induction on 
n). We find x uniquely from E2 : x = ap(a). We take (a1, yi /2r7y?) to be 
any solution of the Pell equation X? — (a? — 1)(2x?y?)°Y? = 1; this gives Ey. 
A is found uniquely from Es. We take (x2, y2) to be the nth solution of the 


A-equation. Now all choices have been made. To verify E7 and Eg we need two 
lemmas. 


5.4. Lemma. y;(a) = k mod(a— 1). 


5.5. Lemma. If a= b mod c, then yn(a) = yn(b) mod c. 


These lemmas will be proved in 5.8. 
We use these lemmas as follows. From Es we obtain 


A=a+t(1+ (a? — 1)y?)(1 + (a? — 1)y? — a) = 1 mod 2y, 
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because of E3. Lemma 5.4 then gives yg = yn(A) = n mod 2y; this is Ex. 
Lemma 5.5 gives yn(A) = yn(a) mod 2} (because of E5); this is Ez. 

The inclusion pr E’ c E. From the relations F\,..., gs we have only to 
prove that n is the number of the solution (x,y). Note that n occurs only 
in Eg. 

For the time being we let N, Ni, and Nz denote the numbers of the solutions 
(x,y), (%1,y1), and (x2, y2), respectively. We shall prove that 


n=N or n=—N mod 2y. 


Since we also have y > n (by E)) and y > N (by the definition of N), it follows 
that n = N, as required. The number Np» will be the “stepping stone” to get 
from n to N. 

First of all, as before, it follows from Es that A = 1 mod 2y, and then it 
follows from the definition of Ng and Lemma 5.4 that y2 = N2 mod 2y. But by 
Eg we have yz = n mod 2y; hence 


No =n mod 2y. 


Next, A = a mod x? by Es, and then y2 = yn,(A) = yn,(a) mod a7 by 
Lemma 5.5. Using E7, we have y = yn(a) = y2 mod a7. Hence 


yn(a) = yn, (a) mod x7. 
We now need two more lemmas, which will be proved in 5.8. 


5.6. Lemma. If y;(a) = y;(a) mod z,(a), where a > 1, then either i = j or 
i= —j mod 2n. 


5.7. Lemma. If y;(a)? divides y;(a), then y;(a) divides j. 


If we apply Lemma 5.6 with N, No, and Nj in place of i, 7, and n, and use the 
last congruence proved, we obtain 


N= +N mod 2N}. 


If we apply Lemma 5.7 with N and Nj, in place of i and 7, and use E3, we 
obtain y|N). Hence 


N =+Np2 mod 2y, 


and since we have already shown that Nz = n mod 2y, this completes the proof. 


5.8. PROOF OF THE LEMMAS. We shall write x, and y,, instead of x,(a) and 
Yn(a). Using the formula 


k 
Ink + YnkV a2 —1= (en + ynV a? — 1) F 
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we find that 


Ynk = 3 (5) ak-Jyd (a? — i haa 
i<k 
j=1(mod 2) 


In particular, 
Ynk = ka*®-1y, mod (a? — 1), 

which gives Lemma 5.4 if we set n = 1. In addition, we have 
Unk = kak-ty, mod y3. 

If we replace nk, k, and n by n, n/k, and k, respectively, we obtain 
Yn = oop yg mod y;. 

Since x, and yz are relatively prime, we have 

n 


7 =0 mod yz, > n=0 mod yx, 


Ym =O mod yz > 
which gives Lemma 5.7. 


If we write y,,(a) as a polynomial in a with integer coefficients whose degree 
and coefficients depend only on n, we immediately obtain Lemma 5.5. It remains 
to prove Lemma 5.6. 

First of all, the equation 


Intm + V a? —1 Yntm = (2, +V a? — Tym) (2a ry az —1 Ym) 


gives us 
Ent = tatm + (a — lings 
Yntm = ELnYm + LmYn- 
Hence, 
Yontm = Ynt(ntm) =TnimYn Mod ty = +(a* —1)y2y¥m mod fp 


= Fym mod zp, 
and, similarly, 


Y4ntm = Yan+(2n+m) = —Y2n+m mod In = Yim mod In 


This means that the class y; mod x, has period 4n as a function of k, and within 
(1, 4n] its behavior is determined by its values on the first quarter-period [1, n]: 


Y2ntm = FYm; Yam = £Ym, for 1 gmen. 


If a > 3, it is clear that Lemma 5.6 follows from these facts and from the 
inequality ym <$2n for 1 <m <n, which, in turn, follows because 


Ay? < (a? —1)y2+1=27. 


If a = 2, then we only have ym», < ain form < n—1, but this is still enough 
to complete the proof of the lemma in this case. 
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6.1. Proposition. The set 


in the (m,a,n)-space is Diophantine. 


ProoF. It suffices to show that Eg = EN {ala > 1} is Diophantine. If a > 1, 
we easily obtain by induction on n that 


(2a —1)" < yn41(a) < (2a)", 
in the notation of §5. Hence, for any N > 1 we have 


ali 1 \"_ (2Na—1)” é Yn+i(Na) “i (2Na)” 
2NaJ}  — (2N)"— na (N) ~*~ (2N — 1)" 


1 —n 
=a" |1—-— ‘ 
(1-ay) 
Thus, if we choose N large enough so that both 
i“ 1 i 1 


then we obtain a” = [yn+1(Na)/yn4iCV)] (where the brackets here and below 
denote the integral part of a number). Eo is therefore a projection of the set E): 


a>, 


0 < Yn+1(Na) a Yn4i(N)m < Yyn4i(N), 
N>?, 


where a suitable lower bound for N must be inserted in place of ?, in such a 
way as to keep the last relation Diophantine. An elementary calculation shows 
that it suffices to set N > 4n(y+ 1). The results in 85 then imply that Fy is 
Diophantine if we trivially introduce the auxiliary relations 


y =Ynti(N) and y =yn4i(Na). 


7 The Factorial and Binomial Coefficient Graphs 
Are Diophantine 


In this section we carry out the last series of arguments. 


7.1. Proposition. The set 


in the (r,k,n)-space is Diophantine. 
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Here, by definition, (%) = n(n — 1)---(n —k + 1)/k!. We shall need the 


following lemma. 


7.2. Lemma. If u > n*, then (})= the remainder when [(u+1)"/u*] is divided 
by u. 


PRooF. We have 


(u+1)"/u* = = (") ae (;) dis 3 (") ink. 


uv 


The first sum is divisible by u, and the last sum is less than 1 if u > n*. 


7.3. PROOF OF PROPOSITION 7.1. We introduce the auxiliary variables u and 
v, and take the relations 


Fy: u>n, 
By: v=((wt1)"/u*}; 
E3: r=v_ mod u; 
Ea: r<u; 


Es: n>k. 


Lemma 7.2 immediately implies that E = prn?_, E;. EF; is Diophantine because 
of Proposition 6.1; E3, E4, and Es are obviously Diophantine. It also becomes 
obvious that E> is Diophantine if we write 2 in the form 


(w+1)” < uPv < (wu +1)" +u* 


and again use Proposition 6.1. This completes the proof. 
7.4. Proposition. The set E : m = k! is Diophantine. 


7.5. Lemma. If k > 0 and n > (2k)**!, then k! = [n* /(a)]. (This is proved 


by some simple estimates.) 
PROOF OF PROPOSITION 7.4. We take the auxiliary variable n and the relations 
Ey: n> (2k): 


es mf (9)) 


The rest is obvious (using Propositions 6.1 and 7.1). 
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7.6. Proposition. The set 


_ £_ (pla 
E: g = (Me), p> qk, 


in the (x,y, p,q, k)-space is Diophantine. 


The proof that follows is a slightly more complicated version of the argument 
in 7.2 and 7.3. 


7.7. Lemma. Let a > 0 be an integer such that a = 0 mod q*k! and a > 
2P-1yk+1 Then 


“ —a-} Gua! fe a~?)P/4) aoe [a1 4 a~2)P/a 


This is proved using the binomial Taylor series for (1 + a~?)?/4. The 
inequality a > 2?~1tp*+! allows us to throw away all the terms in the first 
sum starting with the (& + 1)th and all the terms in the second sum starting 
with the kth when we take the integral part. The congruence a = mod q*k! 
ensures that the partial sums are integers. 


7.8. PROOF OF PROPOSITION 7.6. We use the auxiliary variables a, uj, v2, 
and v, and the following relations: 


E,: a=0 mod q*k!; 

Eg: a > 2?-1pF+1, 

E3: u;/ug =a! aaa + a~?)r/4) : 

Ey: v=a [a1 + aus : 

Es: rug = y(ur — vug). 
It follows from Lemma 7.7 that E = pr?_, Ej. E, and E2 are immediately 
seen to be Diophantine from Propositions 6.1 and 7.1. E3 and Ey, are shown to 
be Diophantine just as at the end of 7.3, except that this time we must raise 
the inequalities to the gth power after clearing denominators. Es is obviously 
Diophantine. 


This concludes the proof of Theorem 1.2, that enumerable sets coincide with 
Diophantine sets. 


8 Versal Families 


Versal families were defined and first used in Section 5.7 of Chapter V. The pur- 
pose of this section is to prove their existence, using the result that enumerable 
sets are Diophantine (Theorem 1.2). 


8.1. Theorem. For any m > 0, versal enumerable families of m-sets and 
m-functions over the base Z* exist and can be effectively constructed. 
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Proor. We divide the proof into several steps. Recall that 7?) : (Z+)? 3 Z+ 
is the primitive recursive isomorphism constructed in $4 of Chapter V, and 
GO, t?)) is its inverse. We shall write t; and t2 for brevity. 


(a) A versal family of polynomials in Z* (a1, v2, 73,...]. We define polyno- 
mials f{l] € Z*[r1, x2, 73,...] by recursion on 1 € Z*,1 >4 
[1] = #22] = F[3] =1; 
f{4k] = ks 
f[4k + 1] = xe; 
f[4k + 2] = flti(k)] + flte(-)); 
f[4k + 3) = flti(*)] flta(&)] 
The definition is correct, since t)(k),ta(k) < 4k + 2. The image of the map 
k + f[k] coincides with all of Z*[r1,22,73,...], since it contains Zt (in the 


4k-places) and all the x, (in the 4k + 1-places), and, whenever it contains two 
polynomials f [ky] and f [kg], it contains their sum (in the 47°) (ky, kz) +2-place) 
and their product (in the 47) (k,, ky) -+3-place). (Compare with the numbering 
of constructible sets by ordinals in Chapter V.) 


(b) Construction of a versal 1-family over Z*. Let Ex be the projection onto 
the x1-coordinate of the 0-level of the polynomial f|ti(k)] — f[te(k)]. Since all 
the elements of Z[x1,x2,23,...] can be represented as such a difference, it is 
clear that the family {£;,} contains all enumerable sets. 


(c) {Ex} is enumerable. We must show that the total space F = {(i,7)|i € 
E;} Cc Zt x Z* is enumerable. We write the condition i € E; in the form of 
an £1-type formula, in which all the quantified variables take values in Z*. We 
use the fact that f[ti(y)] — f[te(y)] € Zlai,...,2,;]. We have 


(i,j) CB Sic Ej @ da---de;(21 =1A flti(9)] = flte())) 
at((Sx1 --- da, Vk < 7(f[k] = Gd(k, t))) 
A Gd(5, t) = 7A Gd(t1(7), t) = Gd(te(J), £)), 


where Gd(k, t) is Gédel’s function (see §4 of Chapter V). Furthermore, by 
the definition of f[&], 


dx, +++ dajVk < j(flk] = Gd(k, t)) 

& Vk < 9((k < 3A Gd(k, t) = 1) V 5I((k = 41 A Gd(k, t) = 1) 
V (k = 41+ 2 A Gd(k, t) = Gd(t1 (1), t) + Gd(t2(1), t)) 

V (k = 41+ 3A Gd(k, t) = Gd(ti (1), t)Gd(t2(1), t)))). 


Here the part of the formula after 4/ defines a decidable set in (k, t,1)-space. The 
quantifier 4/ projects this set onto the (k,t)-coordinates, thereby taking it to 
an enumerable set, and the bounded quantifier Vk < 7 preserves enumerability 
(see §2). Returning to the formula that defines FE, we find that the set we have 
constructed so far must be intersected with two other decidable sets and then 
projected along the t-axis, so that the result is again enumerable. 
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(d) Construction of a versal m-family over Zt. The case m = 0 is trivial, 
and the case m = 1 has already been discussed. The case m > 2 reduces to the 
case m = 1 using the isomorphism 7” : (Z+)™3Zt. In fact, let E, = EO 
be a versal 1-family, and set Em = (76))-1( BO), The family {Bm} is 
enumerable because 


E™ = {(x,k)|2 € E™} = {((r0)-*(a), b) fx € EY} 
= (7), pry TE, 


(e) Construction of a versal family of 1-functions. We take a versal 2-family 
{BP} with total space 


E®) = {(x,y, k)|(e,y) € EO} c (Zt). 


Let g(x, y,k, z) be a primitive recursive function such that the projection of its 
1-level onto the (2, y, k)-coordinates coincides with E°). We set 


f(a, k) = 12 (min { ulg(w,t{?(u), EP (w)) = }) 


We claim that { fi|fx(x) = f(x,k)} is a versal family of 1-functions. The total 
function is obviously partial recursive. We need only verify that every partial 
recursive 1-function f occurs in the family. 


Let I's be the graph of f, and let 'y = E\°), where ky € Z*. We show that 
f = feo. In fact, 


(x, f(a)) ET, = EO © (a, f(x), ko) € B® & Az € 2", 
g(x, f(x), ko, z) =1. 
Among the z € Z* that make g(x, f(x), ko,z) = 1, we choose the z for which 
the number u given by (f(x), z) = ( (u), (wu) is minimal. For this u we 
have fio (x) = t\) (uw) = f(x), which proves the claim. 


(f) Construction of a versal family of m-functions. The case m = 0 is trivial. 
If { FO} is a versal family of 1-functions, then for m > 2 we set 


FE (1, ..04¢m) = AO (a1, .-. 2m); 


thereby obtaining a versal family of m-functions. 
The theorem is proved. 


8.2. The choice of versal families is far from unique. If m > 1, there does not 
exist a versal family that contains each function or each set exactly once (i.e., 
a universal family). Nevertheless, there are important methods of extracting 
invariant information from data about the position of a function or set in a 
versal family. The next section is devoted to this question. 
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9 Kolmogorov Complexity 


9.1. Let u = {ux} be an enumerable family of m-functions over Zt, and let f 
be a partial recursive m-function. We define the complexity of f relative to the 
family u as 


C,(f) = min{klu, = f}, if such a k exists; 
ee otherwise. 


We call the enumerable family u (asymptotically) optimal if for any other 
enumerable family v, there exists a constant C,,, > 0 such that for every partial 
recursive m-function f we have 


Cu(f) < Cia Ss 


If we take v to be any versal family, we see that an optimal family must be 
versal, i.e., Cy.(f) never takes the value oo. 


9.2. Theorem (Kolmogorov) 


(a) For any m > 0, optimal families exist and can be effectively constructed. 
(b) If u and v are optimal families of m-functions, then for any m-function f, 


Co Cul fi Cul f= Cam: 


9.3. Remarks 


(a) The measure of complexity C.,(f) involves the following intuitive ideas. 
In order to define any enumerable family u, it is necessary to give only a finite 
amount of information, for example, a program that semicomputes the total 
function of u. Therefore, in order to define a specific function f that occurs in 
the family u, it suffices to give no more than 


log, C.(f) + const 


bits of information, namely, the program for u and the number of f in uw. 


(b) A family being optimal means that it can be used to compute any 
m-function, and that the loss in using it rather than any other family to compute 
a function is bounded by a constant that does not depend on the function. 


(c) Finally, the inequality 9.2(b), which follows trivially from the definition 
of an optimal family, shows that to within an additive term that is bounded in 
absolute value, the logarithmic measure of complexity 


Kul(f) = [logy Cu(f)]} +1. (where“[ |” = “integral part” ) 


does not depend on the choice of the optimal family u, and so is an asymptotic 
invariant of f. 


9 Kolmogorov Complexity 227 


9.4. PROOF OF THEOREM 9.2. We first choose a recursive embedding @ : 
Z* x Z* — Z* that has a recursive inverse function and that satisfies the 
following linear growth condition in one of its arguments: 


6(k,j) <k-@(j), for all k,j € Z* and some suitable ¢: Zt — Z*. 


For example, we could let 01(k, 7) = (2k — 1)2? with ¢1(j) = 2771, or, following 
Kolmogorov, we could let 


02 (ki ka “Kip, JiJ2- “js) = fijis++jsjsO1ky- ++ kp, 


where ka, jg € {0,1} and the bar denotes the binary expansion of a number. 
Here ¢2(j) < const - j?, so that this function grows more slowly. (See also 
Section 9.8 below.) 

Now let U be any versal family of (m+ 1)-functions. We define a family u 
of m-functions by setting 


tt Cin vigtny hl =O Gisgtast (A) 


We show that the family u is optimal, with the following bound for the con- 
stants: 


Cu,v < (Cu (v)). 


In fact, let f be a recursive m-function. It suffices to consider the case in which 
f occurs in the family v. Then 


f (21, Rica ; hin) = v(a1,. oa ,@m3;Co(f)) 
=O ipsicste GAs Cale) 
= u(a1, ae ,Lm,O(Cy(f),Cu(v))), 


so that 


Cul(f) < (Cr(f), Cu(v)) < Co(f)e(Cu(v)). 


The theorem is proved. 


9.5. EXAMPLE. A O0-function f can be identified with the single value it takes, 
i.e., with a positive integer n. In this case, Theorem 9.2 gives us an almost 
invariant complexity C.,(n) for the integers. We have: 


66.099 


(a) Cu(n) < const-n for all n, since the function “n” appears in the nth place 
in the simplest versal family un(-) =n. 

(b) C(n) ~ min{2I~!(2k—1)|n is the kth value of the jth function in some versal 
family of 1-functions}. (We write f ~ g if f and g have the same domain 
of definition, and f < const-g and g < const-f for suitable constants. In 
relations of the type Cu(fx) ~ g(k), we often omit the designation of the 
optimal family u, which we take to be arbitrary, but fixed.) 
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It is clear from (b) that the complexity of the numbers p,, (the nth prime), 
2 
n, or 


nr” (nm times) 


as nm — oo is asymptotically no greater than const - n, since each of these is 
the nth value of a fixed recursive function. In 9.7(b) below, we shall lower this 
estimate to const - C(n). 

Instead of integers, Kolmogorov and his collaborators considered finite 
binary sequences and constructed a theory that showed that the most com- 
plex binary sequences are those that approach random behavior. See the survey 
article by A. K. Zvonkin and L. A. Levin in Uspehi Matem. Nauk, vol. XXV, 
No. 6 (1970) (translated in Russian Mathematical Surveys), which contains a 
large bibliography. 


9.6. Proposition. 


(a) Let 
P= fo(fi(ai,-- pt) ya waa (Diges ima); Cray ae Bp) 


where the f; are recursive functions. Then 


C(F) < const - [Low (we Tou) 


i=l i=l 


if fo is fixed and f; runs through all possible m-functions. Here const 
depends on fo and on the families used to compute the complexity, but does 
not depend on fi,..-, fn- 

(b) If fo is also allowed to vary, then [Ji_, must be replaced by TJ}, and log”! 
must be replaced by log” on the right. 


9.7. Special cases 


(a) If, for example, we set fp = sumg or prodg, then we have 
C(fi + fe), C(fife) < const C(fi)C(f2) log(C(fi)C(f2)). 
(b) If we set n = 1 and m = 0, we find that for any enumerable family {f;,}, 
C(f(k,@1,...,%p)) < const C(k). 


9.8. PROOF OF PROPOSITION 9.6. First of all, for every n > 1 we define the 
following recursive bijection with a recursive inverse: 
the index of the n-tuple (k1,...,kn) if we order n-tuples 


6) (ky, ..., km) = according to increasing I k;, and in alphabetical order 


for fixed [| ki. 
i=1 
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It is easy to see (by induction on n) that 


n n n—-1 
A) (ky, ...5kn) < const | | ki (veITs) ’ 
i=1 i=l 


We define the function © : (Z+)"* — Z*+ as follows: 


O(l1,.--5In41) = 0(0™ (11, ..., In), Inga), 


where @ is as described in 9.4. 

We now consider two optimal families v(x1,...,@p,1) and u(a1,...,2%m,k) of 
p-functions and m-functions, respectively. We use these two families to construct 
the families 

W (25.0025 Up; Rigeie, bay!) 

= v(u(%iys -.) Sins ha) on M21 Bm hn) Pmt ys sy Lay), 
w(t1,..-;2p,k) = W(a1,..-, 2p," (k)). 


The function F occurs in the 


0 (0 (Cu(fi); - ding a ads Cu(fo)) 


place in the family w. Then the estimate 6(k,j) < k- (7), along with the 
estimate for 0, gives assertion (a). 


We similarly obtain (b) if we replace © by 6("+) in the definition of w. 


Remark. The function 6) gives us the most economical estimate for C(F) 
that is symmetrical in the C(f1),...,C (fn). In specific situations it might make 
sense to improve the estimate in certain of the C(f;) at the expense of worsening 
the estimate with respect to the others; this is done by suitably changing 6. For 
example, Kolmogorov’s 6 gives 


Cr+ hy < coust CU fiCer, 


which is better than 


const C(f1)C( fe) log(C(f1)C(f2)) 


if C(f2) grows much more slowly than C(f1). 


9.9. Theorem. The function C(f) is not computable. More precisely, let g(k) 
be any unbounded partial recursive function, and let {f,} be any enumerable 
family. Then it is false that C(fk)|p(g) ~ g(k)- 


Thus, C( fx) can be computable (even up to ~) only ona set of indices & such 
that there are only finitely many different functions among the functions f;,; 
otherwise, C(f;,) is not bounded on this set. 
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PROOF. Suppose that C(fx)|p(g) ~ g(k). We show that there exists a general 
recursive function h : Zt — Zt whose image is contained in D(g) and such that 
g ° h is monotonically increasing. We then obtain a contradiction as follows. 
By 9.7(b), for all & we have 


C( fre) < const C(k), 
and, by our assumption and by the fact that g o A is increasing, 
C(frck)) 2 const g(h(k)) > const - k. 


But these two inequalities are incompatible, because lim inf C(k)/k = 0 (for 
example, C(k?)/k? < const/k). 

It remains to construct h. We choose a general recursive bijection hy : Zt + 
D(g), using Proposition 5.6 of Chapter V, and we set 


B = {kli < k, g(hi(i)) < g(ha(k))}. 


This set is decidable and infinite, and g o hj is an increasing function on FE. 

Let ho : Z* — E be an increasing general recursive bijection (again using 
Proposition 5.6 of Chapter V). Then h = h; 0 hg has the necessary properties. 
The theorem is proved. 


9.10. Remarks 


(a) Theorem 9.9 shows that computing complexity is a problem demanding 
creativity: even if we find the number of a place where f occurs in an optimal 
family {u;}, there is no algorithm that could tell us whether this function occurs 
even sooner. 


(b) Since C(k) 4 C(l) = k ¥ 1, it follows that for all x and B, 
card {yly < #7, Cy) < 2#/B} < 2/B, 


i.e., most numbers have a large complexity. 

Nevertheless, it is not possible to give effectively a sequence of numbers 
that asymptotically have maximal complexity. More precisely, let {k;} be any 
increasing sequence with C(k;) > k;/B for some constant B. Then the set {k;} 
does not contain a single infinite enumerable set EL. Otherwise, we would be 
able to find an increasing general recursive function h : Zt — E, and would 
obtain a contradiction, as in Theorem 9.9. 

(c) Let u = {ux} be any optimal family of m-functions. The “moments of 
first appearance” {k|Vi < k,u; 4 ux} actually form a sequence of asymptotically 
maximal complexity, since, by the definition and by 9.7(b), they satisfy 


k = Cu(ur) < const - C(k). 


Thus, we might say that in an optimal family the functions first appear “at 
random moments.” 

The problem of computing C(u,) is complicated by the fact that, at least in 
the specific families in the proof of Theorem 9.2, any function appears infinitely 
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often, so that if we are not lucky we might first notice the function arbitrarily 
far out from the place where it first appeared. 

(d) Finally, we mention that at least one essential aspect of the complexity 
of computations has not been touched upon in our discussion of C,,. Namely, 
log, C(k) measures the length of a program that could compute k, but says noth- 
ing about the time it takes for such a program to work, let alone the possibilities 
for shortening the time by performing parallel computations, lengthening the 
program, and so on. 

The concept of complexity is rather far removed from practical uses. But it 
seems to be such a fundamental idea that its role in theoretical mathematics is 
likely to grow. 


VII 


Godel’s Incompleteness Theorem 


1 Arithmetic of Syntax 


1.1. In this section we show how the syntax of formal languages reduces in 
principle to arithmetic. We do this by identifying the symbols, expressions, and 
texts in a finite or countable alphabet A with certain natural numbers (i.e., by 
numbering them) in such a way that the syntactic operations (juxtaposition, 
substitution, etc.) are represented by recursive functions, and the syntactic 
relations (occurrence in an expression, “being a formula,” etc.) are represented 
by decidable or enumerable sets. 

In Chapter II we described how this technique works for Smullyan’s language 
of arithmetic, but now we shall investigate it more systematically. Our first task 
is to show that the computability of syntactic operations and the decidability 
(enumerability) of syntactic relations on the sets of expressions and texts do not 
depend on how we number them, as long as we adhere to certain weak natural 
restrictions. 

This independence of the method of numbering allows us to consider this 
numbering not only as a technical device, but also as a reflection of a deep equiv- 
alence between arithmetic and the combinatorial properties of formal texts. In 
modern computers, where a single store-location may serve consecutively as a 
number, a name (code), and a command, this equivalence between syntax and 
arithmetic is realized “in the flesh” and is accepted as a basic principle. This 
was not the case, however, in 1931, when Godel first introduced the concept of 
numbering. 


1.2. Numbering. Let S be a finite or countable set. By a numbering of S we 
mean any injective map N : S — Z* whose image is decidable. We call N(s) 
the N-number of an element s € S. We call two numberings N and WM of a set 
S equivalent if the partial functions No M~'t and Mo N7! from Z* to Zt 
are partial recursive. These functions are automatically computable (not only 
semicomputable), since their domains of definition are decidable (see §1-2 of 
Chapter V). 
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The intuitive meaning of these definitions is clear: requiring the set of N(s) 
to be decidable ensures that it is possible to determine whether a natural 
number has the property of “being the number of an element of S,” and two 
numberings are equivalent when each of them can be effectively recovered from 
the other for any s € S. 


1.3. Lemma. 


(a) The relation of equivalence between numberings is reflexive, symmetric, and 
transitive. 

(b) Any injective map from a finite set S to Z* is a numbering, and any two 
numberings of a finite S are equivalent. 

(c) Any numbering of an infinite set is equivalent to a numbering whose image 
is all of ZT. 


All this either is obvious or has already been proved. In particular, (c) 
follows from Proposition 5.2 in Chapter V. 


1.4. Let S$; and S_ be two sets, and let N; : S; = Zt,i = 1,2, be numberings 
of them. We call a partial function f : S; — Sy» partial recursive relative to 
(N,, No) if the map N20 f o N,' is partial recursive. A tautological example: 
any numbering function N : S$ — Zr? is partial recursive relative to 
(N, identity). 


A subset T C S is said to be decidable, enumerable, arithmetical (i-e., 
definable in L;Ar, see Chapter II, §2) relative to the numbering N, if the set 
N,(T) has the corresponding property. 


1.5. Lemma. If (Nj), No) is replaced by a pair of equivalent numberings (Ni, N5) 
in 1.4, then the classes of recursive functions f : Sy — S2 and of decidable, 
enumerable, and arithmetical subsets of S; do not change. 


ProoF. The composition of computable recursive functions is recursive and 
computable. The inverse image of a decidable (respectively enumerable) set 
with respect to a computable function is decidable (respectively enumerable). 
Finally, suppose that f : Zt — Z* is a partial recursive function, and that 
E C Z* is an arithmetical set. Then f~!(E) = pr, ((Z* x E)OT'f)(in Z* x ZT). 
Since Z* x E is arithmetical and I’; is also arithmetical (even Diophantine), it 
follows that f~'(£) is arithmetical. 


1.6. Let S; be sets with numberings N;, i = 1,...,r. A numbering 
N:5,x-++-x S$, — Zt is said to be compatible with (N,,...,N,) if the projec- 
tion pr; : S, x --- x S,— Sj is recursive relative to (N, N;) for alli =1,...,7r, 


and if the partial function 


= = 
(ate ae gee x gh At 


is recursive. In other words, the N;-numbers of the coordinates are computed 
from the N-number of the vector, and conversely. 
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1.7. Lemma. 


(a) In the notation of 1.5, for any (Ni,...,.N,) there exists a numbering N that 
is compatible with them. For example, for s; € S;,i=1,...,r, we may set 


N(s1, aay S,) = 7") (Ni (81), tee ,N,(S;)) 


(for the definition of 7‘), see Section 4.5 in Chapter V). 

(b) If N is compatible with (Ni,...,N,), N is equivalent to M, and N; is equiv- 
alent to M;, fori=1,...,r, then M is compatible with (M,,...,M,). 

(c) If N is compatible with (N1,...,N,) and M is compatible with (Ni,...,N,), 
then N and M are equivalent. If N is compatible with (Ni,...,N,) and also 
with (M1,...M,), then N; and M; are equivalent for alli =1,...,r. 


What all this says is that the relationship of compatibility gives a 
one-to-one correspondence between families consisting of r equivalence classes 
of numberings of the sets S),...,5, and certain equivalence classes of num- 
berings of S; x --- x S;. This lemma is proved by mechanically checking the 
definitions. 


1.8. Let A! = Ax---x A (I times), and let $(A) = AbU A? U--» U ALU. 
If A is an alphabet, then S(A) is the set of expressions in the alphabet. Here 
A®° = {A} consists of the empty expression. The function S(A) — Z* that takes 
the value p on each element of A” is called the length of the expression. The “ith 
coordinate” partial function from Zt x S(A) to At given by (i, (a1,...,@p)) 2 ai 
is defined on the subset )7=, {i} x (A*UA’*1U---). The “juxtaposition” function 
from S(A) x S(A) to S(A) takes 


((a1,+-+,@p), (b1,.-., bg)) to (Q1,--+,@p, by,..., 0g). 


A numbering N of S(A) is called admissible if the length function, the ith 
coordinate function, and the juxtaposition function are partial recursive relative 
to (N,id), ((id, N),.N), and ((N, N), N), respectively. A numbering N of S(A) 
is said to be compatible with a numbering No of A if it is admissible and if the 
restriction of N to A’ is equivalent to No on A (where we identify A’ with A). 

Here is the basic result of this section: 


1.9. Proposition. 


(a) If N is admissible, then any numbering equivalent to N is also admissible. 

(b) If N if compatible with No, N’ is equivalent to N, and No is equivalent to 
No, then N is compatible with No- 

(c) If N and N' are both compatible with No then they are equivalent. 

(d) For any numbering No of A, there exists a compatible numbering N of S(A), 
whose equivalence class is uniquely determined by the class of No because 


of (c). 


PROOF. We obtain (a) and (b) formally from Lemma 1.6. To prove (c), we find 
the N-number of an expression from its N -number as follows. Let m*n = 
N(N~!(m)N~!(n)) (where the argument of N is the juxtaposition of the two 
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expressions N~'(m) and N~1(n)). The partial function from Z*+ x Zt to Zt 
defined by (m,n) ++ m* n is recursive and associative, since N is admissible. 
Further, let (k); = N (the ith coordinate of N~'(k)). The partial function 
Z* x Zt + Zt: (k,i) + (k); is recursive for the same reason. We similarly 
define (k)'; in terms of N’. Finally, let I’ : Z+ — Zt be the partial function 
“the length of N’~1(k).” It is also recursive. 

Then we have 


, 


NoN—(k) = NoN7*((k)i) #:-.&NoN7} (ray) 

But the N’-numbers of the one-letter expressions {(k),} form a decidable subset 
of Z* (namely, the Llevel of the computable function /’). The restriction of 
No N'~! to this subset is a recursive function, since the restrictions of N and 
N’ to A! are equivalent. We obtain (c) from this and from the recursiveness 
of *,(k),, and l' (by applying induction on x to *?_,N o N’~1((k),) and then 
substituting a = I’ (k)). 

We prove (d) using an explicit construction of Gédel (the idea of which, 
incidentally, goes back to Leibniz). 

(d,) Construction of N compatible with No: 

N(ai, eaey ens) = proe) al - pNolam) 

where p; = 2, p2 = 3,... are the prime numbers. Here N(A) = 1. We verify that 
N has the required properties. 

(dg) N is a numbering. First of all, N : S(A) — Zt is an embedding because 
No: A — Z* is injective, and we have unique factorization in Z*. 

We show that the image of N is decidable. In the first place, the set of 
prime numbers in Z* is decidable, since it is the 2-level of the everywhere 
defined recursive function 


n++ the number of divisors of n = > d(k,n)—n, 
k=1 


where (see §3 of Chapter V) 


2, if k|n, 


d(k,n) = s ((rem(k,n) — k)? +1) = 
Veen) =e ( (real) = 8)" 41) i otherwise, 


s(1)=2, s(22)=1. 
Thus, the function i ++ p; is recursive (see the proof of Proposition 5.2 in 
Chapter V). 
We now set 
f(n,i,y) = s ((rem(pY, n) — p?)? +1). 


This function is recursive, and hence so is the function of (n, i) 


u,(n) = min{y|f(n, i, y) = 1} = (the power of p; which divides n) + 1. 
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This implies that the “length” function is recursive: 


i(n) = the number of prime divisors of n = y, s(vi(n))—n 
i=1 


(automatically pm, {nm when m > n, since pm > mM). 
Now let EF be the image of No in Zt. Then 


image of N = {n|Vi < l(n), vi(n) € E+ 1}. 


But the set F = {(i,n)|u,(n) € E +1} is decidable, since it is the preimage 
of E +1 under v, and applying the bounded universal quantifier preserves 
the decidability. In fact, let yr(i,n) = 1 if (i,n) © F and yer(i,n) = 2 if 
(i,n) ¢ F. Then the image of N is the 2-level of the following function of 
n: 8) vp (i,n)). 

(d3) N is admissible. We have already shown that the length function is 
recursive. The ith coordinate function is represented by [pri /pi] (the integral 
part). Finally, juxtaposition is represented by the function 


U(n) 
= vj(n)—-1 
m*ken=m | | Prom)+5 
j=l 


which is recursive by what has already been proved. 

We note that our number-theoretic functions are defined on all of Zt, not 
only on the Gédel numbers of any specific numbering. In what follows we shall 
point out when such an extension of the domains of definition is possible only 
if there is a special reason for mentioning this possibility. 

(d4) N is compatible with No. The functions 7 ++ 2° and y + logs(y) 
(y € 2°) tell us how to go from one numbering to the other on one-letter 
expressions. These functions are obviously recursive. 

This completes the proof of Proposition 1.9. 


1.10. Concluding remarks. Proposition 1.9 shows that if we are given an equiv- 
alence class of numberings of an alphabet A of a formal language, then this 
uniquely determines an equivalence class of numberings of the set of expres- 
sions S(A), of the set of texts S(.S(A)), and so on, all of which are compatible 
with the numberings of A in the given class. Hence, the set of recursive opera- 
tions and the set of decidable or enumerable relations are invariantly defined on 
the expressions and texts. The only nonuniqueness that remains is the choice 
of the equivalence class of the numbering of A. 

In all cases of which the author is aware, this choice is also determined 
canonically in the following way. Namely, A is realized as a decidable subset 
of the expressions in some finite “protoalphabet” Ao, where decidability is 
understood in the sense of any numbering of S(Ao) that is compatible with 
any numbering of Ao. It follows from Lemmas 1.3 and 1.5 and Proposition 1.9 
applied to Ap that the resulting class of numberings of A will not depend on 
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either the embedding of A in $(Ao), the numbering of Ap, or even the choice 
of Ag (where we recall that if Ap C Aj, are finite, then S(Ao) C S(Aj) is 
decidable). 

From this point of view, it is natural to consider the nine-letter alpha- 
bet of SAr, which was described in §10 of Chapter II, to be a protoalphabet. 
Then x, zv, az, x ,... are elements of the “real alphabet.” Smullyan’s particular 
numbering system is very convenient for proving Tarski’s theorem, but the 
“undefinability of truth” in SAr does not depend on the special form of this 
numbering, as should by now be completely clear. 

More generally, any complete printed description of any alphabet A realizes 
A in the protoalphabet of available typographical symbols, which is of 
course finite, and thereby determines a canonical equivalence class of num- 
berings of A. 


2 Incompleteness Principles 


2.1. Gédel’s theorem on the incompleteness of formal theories can be given 
many precise formulations, none of which entirely exhausts its content. In this 
section, using the results obtained in §1, we shall try to separate the conceptual 
aspects of the theorem from the technical details needed to prove it for various 
languages. 


2.2. Let A be a finite or countable alphabet with its canonical equivalence class 
of numberings, and let SA) be the set of expressions in A. We suppose that 
the following two subsets of S(A) have somehow been defined: 


(a) T C S(A), the set of “true” expressions. For example, we might have been 
given a language with A as its alphabet, some sort of semantics for the 
language, and a truth function. 

(b) DC S(A), the set of “provable” or “deducible” expressions. This set might 
be described by giving “axioms” and “rules of deduction,” or in some other 
way. We shall always assume that D C T, as the terminology suggests 
(it is possible to prove only what is true). 


There is every reason to expect that if D and T’ have been constructed 
“in a natural way” in the process of formalizing some fragment of modern 
mathematics, then the following principles hold true. 


2.3. The set D is enumerable. The intuitive arguments to support this assertion 
are as follows. Suppose that the “provable” expressions are those for which 
“proofs” exist. Here “proofs” are certain texts that, perhaps, are written 
in another alphabet B, i.e., they are elements of S(S(B)). (For example, 
theorems in L; Ar may be proved in L; Set.) One minimal requirement for formal 
mathematical proofs is that it must be possible mechanically to determine that 
they are proofs, i.e., they must form a decidable subset of S(S(B)). (Here it 
would actually be sufficient to require that the set of “proofs” be enumerable.) 
Another unavoidable requirement is that from every proof we must be able 
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to obtain mechanically the “expression proved” in S(A). In other words, the 
partial function from S(S(B)) to S(A) given by “proof” + “expression proved” 
must be (semi)computable. But then the image of this function is enumerable. 
In $5 we show that the set of deducible formulas in £; is enumerable, in accor- 
dance with these informal considerations. 

We note that a time aspect has implicitly entered into the discussion. A 
“proof” is understood to mean a “proof using the means accepted at the present 
time and (semi)identifiable as being accepted.” If, for example, we introduce a 
new axiom of set theory and it becomes widely accepted, then the concept of a 
proof becomes broader, as happened with the axiom of choice (or, rather, the 
principle of transfinite induction, Zorn’s lemma, ...). See the discussion in 87. 


2.4. The set T is not enumerable if the semantics of truth is rich enough to 
include elementary arithmetic. We clearly have in mind some version of Tarski’s 
theorem, which, in fact, even tells us that T is not an arithmetical set. In the 
next section we give several precise formulations of this principle. (See also 
Sections 7.3-7.4 below.) 


2.5. Gédel’s incompleteness theorem (General form). All formal theories 
of mathematics satisfy the principles 2.3 and 2.4. Therefore, if a theory is 
sufficiently rich, it always contains true expressions that are not provable. 


3 Nonenumerability of True Formulas 


The following criteria are all variations on a single theme, even if this is not 
obvious at first, namely, “self-reference, or the diagonal process.” 


3.1. The language SAr. We refer the reader to §10 of Chapter II for the descrip- 
tion of this language and its standard interpretation. In §11 of Chapter IIT we 
showed that the set of numbers of true formulas in Smullyan’s numbering sys- 
tem is nonarithmetical. This set is a fortiori nonenumerable, since enumerable 
sets are even Diophantine. 


3.2. The language L;Ar. Here we give two versions of the argument, one of 
which gives the stronger result and the other of which gives the more con- 
crete result. A third version, which is closer to Godel’s original proof, will be 
described in 87. 

(a) Tarski’s theorem for LiAr. The proof that the set of true formulas in 
L,Ar is nonarithmetical can be reduced to Tarski’s theorem for SAr in the 
following way. In the first place, the sets of formulas in L;Ar and SAr are 
decidable in the set of all expressions (this will be shown for Lj Ar in §4). 


In the second place, the translation map {formulas in SAr} 4 {formulas in 
L,Ar}, which was described in §10 of Chapter II, is recursive (as is easily shown 
using the arguments in the next section). Since, the map tr preserves the truth 
function, we have T, = tr~'(Tz,) in the obvious notation. But then, if Ty, 
were arithmetical, it would follow that T, is also arithmetical (see the proof of 
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Lemma 1.5), which contradicts Tarski’s theorem for SAr. It would be a useful 
exercise for the reader, after first reading §4, to carry out this proof in complete 
detail. 

The following argument is simpler and more precise, but it only shows that 
Ty, is nonenumerable, and not that it is nonarithmetical. 

(b) Let E Cc Zt be an enumerable but undecidable set (which exists by §5 of 
Chapter V). Let E be defined by the formula P(x) in Ly Ar, which has one free 
variable x. For n > 2 we set m = +( + (1+(I,1))-- ), which is the term-name 


for the integer n in the obvious canonical £)-type notation. We consider the 
family of closed formulas {—=(P(n))|n € Z*} in Li Ar. 


3.3. Proposition. 


(a) The function Z* — {formulas in LyAr} given byn+> —=P(n) is recursive. 
(b) The set {n|=(P(7)) is true} is nonenumerable. 


Corollary. Ty, is nonenumerable; more precisely, the set of true formulas in 
the family {A(P(n))} is nonenumerable. 


(If T,, were enumerable, its preimage in Zt would also be enumerable.) 


PROOF. 


(a) Let the formula =(P(x)) have the form R, x Rz x --: x Rg+1, where 
x does not occur in the expressions R;. Using the same notation as in the 
proof of Proposition 1.9, for a fixed numbering N of the set of expressions with 
juxtaposition function *« we have 


N(7(P(R))) = N(R1) * N() * N(Ro) * ++ * N(Ro41)- 


Hence, it suffices to show that the function n + N(i) is recursive. But since 
n+1=-+(1,7), it follows that for n > 1, 


N@¥1) = N(+) # N("(") *N@) *# NC) # NY), 


which expresses N(n + 1) recursively in terms of N (i). 
(b) {n|=(P(n)) € Tr,} = Zt\E by the definition of the formula P(z) 
defining E. But the complement of E is nonenumerable, since FE is 
undecidable. 
The proposition and the corollary are proved. 


3.4. Languages at least as rich as L,Ar. Let L be an arbitrary language with 
a (finite or countable) alphabet A, in which we are given a set T of “true” 
expressions. We suppose that LD is no poorer than a language of arithmetic in 
the following sense: There exists a translation map 


tr : {formulas in L;Ar} = {expressions in A} 
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that takes Ty, to T, takes the complement of Tr, to the complement of T, and 
iS TECUrSiUE. 

Then T is nonenumerable. 

Such a translation map can be constructed for LSet, for example. 
Proposition 3.3 shows that, actually, we need only know how to translate into L 
the formulas in the family =(P(7)); this allows us to use a very modest language 
of arithmetic. 


3.5. Remarks 
(a) The series of Diophantine problems “Is P(7i) true?” i.e., “Does the 
Diophantine equation F(n;21,...,v,-) = 0 have a solution in Z*?” (where F is 


a suitable polynomial with integer coefficients; see Chapter VI) has the property 
that no finitely describable collection of means of proof is 
adequate to answer this series of questions completely. One might say that 
even the theory of Diophantine equations is infinitely complicated. 

(b) In some sense any problem in mathematics reduces to a Diophantine 
problem. In fact, after translating the problem into a suitable formal language, 
we may just ask, “Is the formula P or the formula —P provable?” But this is pre- 
cisely the same as asking whether the number of P (the number of =P) belongs 
to the enumerable set D of provable formulas, i.e., whether the Diophantine 
equation corresponding to D in the given series is decidable. 

This gives somewhat unexpected support for Gauss’s opinion regarding the 
queenly status of arithmetic. There even exists a “queen of the Diophantine 
equations” whose graph projects onto the set of numbers of formulas in L,Set 
that are deducible from the Zermelo—Fraenkel axioms. 

But of course, we normally ask “Is P true?” and not “Is P provable?” from 
this point of view, the most creative activity in mathematics is the discovery of 
new principles of proof that do not reduce to the “legacy of the past” and that 
again must be taken on faith. Set theory as a whole was the most recent such 
principle in the modern development of mathematics. The dramatic history 
of its creation and of the disputes surrounding its acceptance is worthy of a 
discovery of this magnitude. 

It is amazing that within formal mathematics it is possible to say something 
about such informal things. See also §7 below. 


4A Syntactic Analysis 


4.1. This section contains the preliminary technical material that will be needed 
in §5, when we prove that the set of deducible formulas in a language of Lj is 
enumerable. 

Let L be a fixed language in £; having a finite or countable alphabet A. In 
order to shorten the technical work somewhat, we assume that we are working 
with a dialect that contains only the connectives = and — and the quantifier V. 
This is not in any sense essential. As in §1, we have a canonical equivalence class 
of numberings of A, which determines numberings of S'\(A), S(S(A)), and so on. 
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The terms “recursive,” “decidable,” etc. will be understood to refer to this class. 
Thus, we may omit explicit mention of the numbering in the statements of the 
basic results. But in the proofs it will be more convenient to work directly with 
a numbering. We therefore fix one of the numberings N : $(A) > Z* with 
juxtaposition function *, length function J, and ith coordinate function (k);, as 
in the proof of Proposition 1.9. We shall assume that m *n > max(m, 7), ie., 
the number of any part of an expression is strictly less than the number of the 
whole expression. Such an N is called a Godel numbering. 

In addition to the conditions given in $1, we require that N satisfy the 
following conditions regarding recognition of the syntactic characteristics of the 
symbols of the alphabet: 


(a) The sets of variables, of constants, of operations, and of relations in A are 
decidable. 


(b) The “degree” function on the set of operations and relations is recursive. 


We are now ready to begin. But before reading further the reader is advised 
to review $1 of Chapter II. 


4.2. The partial function from S(A) x Z* to Z* given by 


the place in P containing the right parenthesis 
(an expression P,i) +> < that corresponds to the left parenthesis in the ith 


place in P 
is computable, i.e., it is recursive and has a decidable domain of definition. 


PrRooF. It will be convenient to use the following notation: if Q is a statement 
about integers in Zt, then 


Ql = 1, if Q is true, 
- 2, if Q is false. 


This is a truth function that has been adjusted so as to take values in Zt, which 
does not contain zero. 

We construct a function Par(k,i): Zt x Zt — Z* as follows: if (k); is not 
defined, or if (k); # N(“(”), or if (k); = N(“(”) but Vj € [#, 1(k)], 57 _; ||(k)m = 
N(“(?)|| 4 >; (Am = N(“(”)]], let Par(k, 7) = 1; otherwise, let Par(k, i) = 
min{j|j < U(k) and D%,-;[I()m = NEC) = Vai llGm = NCCI 
Obviously, when restricted to N~1($(A)) x Z*, the function Par(k,i) gives 
the place in the expression N~!(k) containing the “)” that corresponds to the 
“(” in the ith place if this is possible, and gives 1 when this is not possible. 
(Compare with Lemma 1.2 in §1 of Chapter II.) Hence, it suffices to show that 
Par(k, 7) is recursive. But Par(k,i) has been defined by gluing together a finite 
number (four) of recursive functions having decidable domains of definition (by 
the properties of N). Thus, Par(k,7) is recursive. 
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4.3. The partial function S(A) > Z* given by (an expression P) +> (the number 
of terms in L that are juataposed to get P) is computable. 

We recall that this number is uniquely defined (§1 of Chapter IT). 
PrROoF. We first construct a formula that defines the function 

l(k) +1, if N~1(k) is not a juxtaposition of terms; 
LT(k) = § the number of terms whose otherwise 
juxtaposition is N~!(k), 

from Zt to Z* recursively in terms of its values on smaller values of the argu- 
ment. The way to carry out this syntactic analysis of N~1(k) can be described 
verbally as follows: first see whether N~'((k)1) is a variable or constant and, if 
it is, whether N~1((k)2 «+++ * (k)y(4)) is a juxtaposition of terms; if N~1((k)1) 
is not a variable or constant, check whether it is an operation, and if it is, 
whether it is followed by “(”, whether there is a corresponding “)”, whether a 
juxtaposition of the required number of terms lies between the “(” and the “)”, 
and whether “ )” is followed by a juxtaposition of terms. 

To describe this procedure systematically, we set 


fi(k) = ‘i #---#(k)ygy, if Uk) > 2; 


1, otherwise; 


Fe (k) = (k)3 eae (k) Par(k,2)—1) if4< Par(k, 2); 
° 1, otherwise; 


J (A)par(k,2)41 * °° * (kde), if 1 < Par(k, 2) < I(k); 
fa(k) = . 
le otherwise. 


All of these functions are recursive. 
We now write the following recipe for computing LT(k) recursively: 


—1(k) is a variable > LT(k) = 1, 
i(k) =1 and N-1(k) is a constant > LT(k) = 1, 
~1(k) is neither a variable nor a constant = LT(k) = 2; 

(k)>1 and N- ax is a variable > LT(k) = 1+ LT (fi(k)); 
I(k) >1 and N~*((k)1) is a constant > LT(k) =1+ LT(f,(k)); 
l(k) >1, N7*((k)1) is an operation, (k)2 = N(“(”), 

4 < Par(k, 2) = I(k), 

degree N~*((k)1) = LT (fa(k)) < U(fa(k)) = LT(k) = 1; 
l(k) >1, N7'((k)1) is an operation, (k)2 = N(“(”), 

4 < Par(k, 2) < l(k), 

degree N~*((k)1) = LT(fa(k)) < U(fo(k)), 

LT(fa(k)) < U(fa(k)) > LT(k) = 1+ LT(fa(k)); 

I(k) > 1, and none of the previous additional conditions hold 

=> LT(K)=1+1(k). 
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To show that LT is recursive, we first note that for each of the above eight 
alternatives, we can easily construct a recursive function h;(k,2,y,z) with the 
following property: 


||k satisfies the ith alternative|| = hi(k, LT(fi(k)), LT (fo(k)), LT (fs(k))), 
and we can also construct a recursive function v;(k, x,y,z) with the property 
that k satisfies the 7th alternative > 
LT(k) = u;(k, LT (fi(k)), LT (fa(k)), LT (fs(k))). 
We therefore have the equation 


8 
LT(k) =2 a vi(k, LT (fi(k)), LT (falk)), LT (fa(k))) 


— So (hivi)(k, LT (fi(k)), LT (folk), LT (fa(h))). 


i=1 


Since fi(k) < k for k > 1, this formula allows us successively to compute the 
values of LT(k), starting with LT (1). But the recursion here computes the value 
at k not in terms of the value at k — 1, but in terms of several earlier values. It 
is this that presents the basic difficulty in showing that the syntactic functions 
are recursive. We now describe the device for overcoming this difficulty here 
and in all future cases. 

In general, let $1(k),...,¢s(k) be recursive functions having the property 
that d;(k) < k for alli < s and k > 2. Further, let h(a1,...,@m,k,y1,---,Ys) 
be a recursive function, and let g(x1,...,@n,k) be defined by the relations 


g(@1,---,;%n, 1) = some known recursive function, 
G(Li, sang ,in,k+ 1) = h(a, anne Ras OC is ee Pa, or(k)), 
20 Ol 2Iyse<y Bay Osh). 


Using the juxtaposition function *, we let 


k 
G(a1,...,2n,k) = = O( Since Hnst): 
al 
Since 
g(@1,---,;2n,t) = G(a1,...,2n,k))i 
for alli < U(G(a1,...,2n,k)) = k, and in particular for the greatest such i, it 
follows that to verify that g is recursive, it suffices to show that G is recursive. 
But for k > 2 we have 
G(a1,.--,@n,k +1) 
= G(@1,..-;2n,k) * g(@1,.+-;2n,k +1) 
= G(x, a , Ln, k) 
+ IUGR 215 Bing hy (Gig + <5 Bip) dryyaes og (Gig +90, tn) dg) 


which is in the standard form for a recursive equation. 
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If we apply this device to LT, setting n = 0,5 = 3, and ¢;(k) = fi(k + 1), 
we obtain the recursiveness of LT. 


Corollary. The set of terms is decidable. 
In fact, this set is the 1-level of the computable function LT. 


4.4. The set of atomic formulas is decidable. 
In fact, 


N~1(k) is an atomic formula © (k); is a relation, (k)2 = N(“("), 
Par(k,2) =1(k) > 4, 
and degree N~"((k)1) = LT(fa(k)) < U(fa(k)), 
where f2(k) was defined in 4.3. 


4.5. The set of formulas is decidable. 
In fact, in our dialect, which has been simplified to include only =,—, and 


V, we have 
N~'(k) is a formula 
~ N7'(k) is an atomic formula, or is of the form —(P), 
(P) => (Q), orVz(P), where P and Q are formulas and x is a variable. 


Using the procedure in 4.3, we define the recursive functions 


Ee wuvan( Bug if l(k) > 4: 
fa(k) _ ( )3 * * ( eae 1 ( ) “ 
1, otherwise; 


f (k) _ (k)2 Meas ase (KE) Bexthny—is if Par(k, 1) 2 3; 
: ale otherwise; 


J} (k)par(e,1)—3 *°°* * (A)ay—-1, if 38 > Par(k,1) < 11) — 1; 
fe(k) = ee 
ibs otherwise; 


fr(k) = (k)4 * * (k)ik)—1; if I(k) a ° 
1, otherwise; 


Ate = 1, if N~1+(k) is an atomic formula; 
7 2, otherwise. 


1, if N~1(k) is a formula, 


, otherwise 
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is computed using the following recursive relation (where s(1) = 1 and s(k) = 2 
for k > 2): 
Fm(k) = somin{At(k);||(k)1 = N(*>")|] - [2 = NEC )IL + [Ue) > 4 
-Fm(fa(k)); 
(Ji = N(SC@)II- [Par(k, 1) > 3]] -Fm(fs()) 
‘WA )parea+1 = NCS “I A) Pare y+2 = VCCI 
- ||Par(k, Par(k, 1) + 2) = U(k)]| - Fm(f6()); 
I(k)1 = NCW )|I - [(A)2 = Ma variable)|| - [|(k)3 = NCC)II 
- ||Par(k, 3) = U(k) > 5|| -Fm(fr(k))}- 


Fm(k) is now shown to be recursive using the device described in 4.3. 


Corollary. The sets of formulas of the form =(P),(P) — (Q), and Va(P) are 
decidable. 


4.6. The following function from S(A) x Zt x S(A) to S(A) is computable: 
(P,i,Q) > the result of substituting P for the ith symbol in Q. 


We set 


(m)1 * +++ (m)j-1 * kx (mM); * ++ (m)i(m); ift <U(m); 
1, otherwise. 


Sub(k, i,m) = 


This function is clearly recursive, and coincides with the required map on the 
set of (k,i,m) with k,m € N~1(S(A)). 


4.7. The following relation in Z* x S(A) x S(A) is decidable: “the one-letter 
expression x is a free variable in the ith place in the formula P.” 
If fact, we set 
1, if the condition in 4.7 holds for p= N~1(k) 
Fr(i, k,l) = and(z) = N-*(1); 
2, otherwise. 
Then we have 
N~'(k) is not a formula, or N~'(J) is not a variable, 
or i>Il(k) & Fr(i,k,l) =2 
Now suppose that N~1(k) is a formula, N~'(1) is a variable, and i < I(k). Then 
the following alternatives remain: 
LA (k); > Fr(i, k,l) = 2 
1=(k);, At(k) =1> Fr(z,k,l) =1 
1 =(k);, N~*(k) has the form —=(P) 
> Fr(i, k, l) _ Fr(i, fs(k), 1); 
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1=(k);, N-“(k) has the form (P) — (Q), i < Par(k, 1) 
=> Fr(i, k,l) = Fr(i, fs(k), 1) 
l= (k);, N “(k) has the form (P) — (Q), i > Par(k,1) + 
=> Fr(i, k,l) = Fr(i, fe(&), 0); 
= (k);, N~*(k) has the form Vx(P), (k)2 =1 => Fr(i, k,l) = 2 
= (k);, N° (k) has the form Va(P), (k)2 4 
= Fi, k,l) = Fr, fr(k), 0). 


Here the functions fs, fg, and f7 were defined in 4.5. The rest of the proof that 
Fr is recursive follows the same procedure as in 4.4 and 4.5. 


4.8. The set {(x,P,t)|x is a variable, P is a formula, t is a term, and x does 
not bind t in P} is decidable. 
In terms of the numbers (i,k, m), this condition means that 


Vj < U(k){either (k); A i, or else (k); =t A Fr(j,k, i) = 
or else (k); =2t A Fr(j,k, i) = 
AWn € [1,1(m)|(Fr(j + n — 1, Sub(m, j,k), Sub(m, J, KE) inca) 
= ||(m),, is a variable||)}. 


That is, if t is substituted in place of any free occurrence of x in P, all the 
variables in t remain free. 


4.9. The following partial function is computable: (a, P,t) > the result of sub- 
stituting t in place of all free occurrences of x in P. 
Let (i,k,m) be the numbers of x, P, and t. We set 


(Kk); if Fr(j, k, 1) = 2; 
1 


fj, ,t,m) : if Fr(j, k,1) = 


This is a recursive function. We further set 
I(k) 
Sub ti, k,m) = * Fs; k,i,m). 
j=l 
This is the number of the expression obtained by substituting ¢ in place of all 
free occurrences of x in P. 


5 Enumerability of Deducible Formulas 


5.1. General setup. Let L be any language with a numbered countable 
alphabet A. We suppose that the following data is fixed: 


(a) An enumerable set of “axioms” Ax C S(A). 
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(b) A partial recursive function Inf: Z* x S(S$(A)) > S(A), ie., an enumerable 
family of “rules of deduction.” 


We shall say that an expression P € S(A) is a direct consequence of the 
expressions P,,...,P, by the ith rule of deduction if (i, (Pi,...,P,)) € D(Inf) 
and Inf(i, (Pi,...,P,-)) = P. We shall call an expression P deducible (from the 
“axioms” ) if there exists a finite sequence of expressions P;,...,P, = P such 
that for each j <n either P; € Ax or there exist i € Zt and {Py,,..-, Px, } C 
{P,,...,P)-1} such that P; is a direct consequence of P,,,..., Pr, by the ith 
rule of deduction. We let D denote the set of all deducible expressions. 


5.2. Proposition. D is enumerable. 


Proor. Let a : Zt — S(A) be a recursive function whose image coincides 

with Ax, and let inf : Z* — S(A) be the partial recursive function given by 

inf(n) = Inf(t(n), N7!)(t(n))), where Ny : S(S(A)) 3 Zt is any number- 

ing of the texts that is compatible with the given numbering of the expressions. 
We construct a recursive function d: Zt — S(A) as follows: 


d(2n — 1) = a(n), 
d(2n) = inf(n), n>. 
We claim that its image is D. In fact, it suffices to verify that (a) Ax C image 
of d; and (b) if Pi,...,P, © image of d and P is a direct consequence of 
P,,...,P, by the ith rule of deduction, then P € image of d. 


But (a) is obvious, since all the axioms are written out in the odd-numbered 
places. To verify (b), we choose n such that 


tn) =i, t?(n)=M, ((Pi,...,P,)). 


Then d(2n) = P. The proposition is proved. 


We now verify that the general setup in 5.1 can always be realized in lan- 
guages of £1. 


5.3. The rules of deduction Gen and MP. We define the map Inf : Zt x 
S(S(A)) — S(A) as follows: 
D(Inf) = {(1, (P, (P) — (Q)))|P and Q are formulas} 
U{(i, (P))|P is a formula, i > 2}, 
Inf(1, (P,(P) > (Q))) = Q, 
Inf(i, (P)) = Vari1(Q), 
where x; is the jth variable in L in any fixed numbering of the variables that 


has image Z* and is compatible with the numbering of A. It is clear that Inf 
is recursive and exhausts the rules of deduction Gen and MP. 


5.4. The azioms. We verify that the following sets are enumerable in any 
language in £: 


(a) The tautologies. 
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(b) The logical quantifier axioms. 
(c) The axioms of equality. 


Two other sets we show to be enumerable are: 


(d) The special axioms of Lj Ar. 
(e) The special axioms of L;Set. 


Actually, using the methods of §4 it is not hard to prove that all of these sets 
are even decidable. But the proof of enumerability is somewhat shorter, and 
will suffice for our purposes. 


5.5. The tautologies. In 85 of Chapter II we constructed a finite list of basis 
tautologies and showed that all the other tautologies can be deduced from 
them using MP. Thus, by Proposition 5.2, it is sufficient to verify that the basis 
tautologies are enumerable. 

Each of the basis tautologies determines a set of formulas of the form 


Qi Pi, Q2Pis +++ Pi, Qr4i, 


where the Q; are fixed expressions that are nonempty (with the possible 
exception of Q; and Q,41),t1,---,¢r € {1,...,m}, and (Pi,...,Pm) varies 
over all ordered m-tuples of formulas in L. Since the set of such m-tuples is 
decidable by 4.5 above, and since the operation of juxtaposition is recursive, it 
is clear that we obtain an enumerable set of formulas. 


5.6. The logical quantifier axioms. In case our dialect of £, does not have 
these axioms can be expressed as the following two axiom schemes: 


(a) (Va(P(x))) > (P(8)), if does not bind the term ¢ in the formula P. 
(b) (Wa((P) > (Q))) > ((P) = (Vx(Q))), if a does not occur freely in P. 


By 4.8, the set of triples {(x,P,t)|z does not bind t in P} is decidable, 
and by 4.9, the map (a, P,t) + P(t) is recursive. Since juxtaposition is also 
recursive, the set of axioms (a) is the image of a decidable set under a recursive 
function, and so is enumerable. 

We may similarly conclude that (b) is enumerable if we verify that the 
condition “« does not occur freely in P” is decidable. But this is equivalent to 
the following condition: “the formula obtained from P by substituting either 
of the variables x; and 22 in place of all free occurrences of x in P coincides 
with P,” where (21,72) is any fixed pair of distinct variables. This condition is 
decidable by 4.9. 

5.7. The axioms of equality. By the definition in 4.6 of Chapter II, it suffices 
to show that the set of formulas of the form 


WwW 


(x = y) > (P(@, 2) > P(2,y)) 


is enumerable, where P runs through the atomic formulas in the language, 
x and y are variables, and P(,y) is obtained from P by replacing x by y in 
any subset of the occurrences of x in P. This set of formulas can be obtained, 
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for example, as the image of the following function, which is partial recursive 
by the results in 4.4 and 4.6: 


S(A) x Al x Al x S(Z*) — S(A); 


the expression obtained by sub- 
stituting y in the i1,...,7, places 
in the atomic formula P if x 
occurs in those places. 


ie (x), (y), (day -+ +s dr)) ad 


5.8. The special axioms of L,Ar and LSet. Most of these axioms contain only 
variables of the language, and not “metalanguage” variables for formulas. This 
is true of all the axioms of arithmetic except for induction and all the axioms of 
set theory except for replacement. Each set of axioms not containing variable 
formulas is decidable because it can be described by a condition such as “the 
set of formulas of length 40 in which “(” is in the first place, “V” is in the second 
place, a variable is in the third place, “(” is in the fourth place, ..., “)” is in 
the 39th place, and “)” is in the 40th place; in which the variables in the 3rd, 
8th, and 16th places are the same, in the 9th and 36th places are the same, and 
in the 17th and 37th places are the same; and in which these three variables 
are distinct.” (This is the axiom of regularity in L,Set in normalized notation.) 
Here we could also write down just one copy of each such axiom and generate 
the rest using Gen, the axiom of specialization, and MP. 

The axioms of induction and replacement are shown to be enumerable using 
the same procedure as in the case of the basis tautologies and the quantifier 
axioms. We leave the details to the reader. 


6 The Arithmetical Hierarchy 


6.1. Using recursion on n, we define the classes U,, and I, of subsets of 
(Zt)™,m=0,1,2,..., as follows: 


(a) No = Ip = {decidable sets}. 
(b) Sp4i ={projections of elements of I,, having codimension > 1.} 
(c) In41 ={complements of elements of ©,41 in their ambient spaces (ZT)'}. 


Obviously, ©; consists of all enumerable sets (see Theorem 1.2 of Chapter VI), 
and II, consists of their complements. The following result justifies calling 
{X,,II,} “the arithmetical hierarchy.” 


6.2. Proposition. 


(a) Vn > 0,4, UT, C Uin+1 MN I,41.- 

(b) UP 9p En = UP olIn = {arithmetical sets}, i.e., all sets definable by formulas 
in LAr. 

(c) For n > 1 the sets in, are precisely those that can be defined by formulas 
of the following £1 type (where the quantifiers are taken over variables in 
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Zt, and E is a decidable set): 


Fay Vag Jag +++ Vatn7((11,---;0n,Un41,---;%m) € EB), n even; 


Fa, Vrg dar3--++ dan ((@1,---,2n,Un41,---;2m) € EF), nn odd. 
Similarly, for Un: 


Vay 
V24 


xq Vr3-+++ dan ((t1,---,2n;Ent1;---,%m) € LE), n even; 


Ww Ww 


vq Vag-+-Van7((01,---,2n,2n41,---;%m) € E), n odd. 


(d) The sets in X, and II, are definable by the analogous formulas in LAr 
with the following changes: instead of (%1,...,%m) € E we have any atomic 
formula, and the number of quantifiers is > n, with exactly n—1 alternations 
from 4 toV or to 4G. 


PROOF. 


(a) We use induction on n. For n = 0 we have So U Uo = 44 Nh, by 
the definition of decidable sets. If ©,-1 C UH,, then Up, C Upyi (since Up4i 
consists of projections of the complements of elements of /,,, and /,, consists 
of projections of the complements of elements of ©,—1), and also II, C Uyn41, 
by the definition of II. Finally, we have I], C ¥%,41, from which it trivially 
follows that ©, C Tp41. In fact, if £ € I,, then E x Zt € TI, (since taking 
the product with Z* commutes with complements and projections, and takes 
Xo = Ilo to itself), and hence E = a projection of E x ZF € My 44. 

(b) It follows from (a) that UP) En = Use Mn. This class of sets is 
contained in the arithmetical sets, since all enumerable sets are arithmetical, 
and arithmeticality is preserved on taking projections and complements, which 
correspond to inserting 4 and -, respectively. 


In order to prove the converse {arithmetical sets} C Baars Un = Noo, we first 
note that all sets definable by atomic formulas are decidable, and the rest of the 
arithmetical sets are obtained from them by taking projections, complements, 
unions, and intersections (see §2 of Chapter II). Thus, it suffices to show that 
Neo is closed with respect to (finite) unions and intersections. We claim that 
this is actually true for each ©, separately. 

We prove this by induction on n. The result has already been proved for No. 
If X,, is closed with respect to M, then II, is closed with respect to U. Suppose 
Fy, Eo © Sn4i1, Ej = a projection of F;, and F; € Il,. We can then introduce 
dummy variables so as to identify the ambient spaces of the F;, and the projec- 
tion of these spaces onto an ambient space for both the E;. Then E, U Ez =a 
projection of FU F2, so that Ey UE, € &,41. Thus, %,+1 is closed with respect 
to U. 

Similarly, if U,, is closed with respect to U, it follows that II, is closed with 
respect to M, and an analogous argument shows that U,+1 is closed with respect 
to M. However, here we must embed the products F; x (Z*)'™? and (Zt)! x Fp 
for certain m; and mg in a single space in such a way that when we identify 
the two projections, we have pr(F\ x (Zt). 9 (Zt)™ x Fo) = pr FM pr Fy. 
In terms of formulas this means that the variables bound by the 4 quantifiers 
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in the formulas corresponding to F, and F) must be renamed so that they form 
two disjoint sets. 

(c) This assertion is proved by induction on n and a simple examina- 
tion of the definitions. Here, whenever we take the complement, we must 
move the corresponding — to the right of all the quantifiers by means of 
the usual commutation rule ~V = 4,74 = V-. If we have a projection of 
codimension m > 2, which is defined by a series of quantifiers da;,---dz;,,, 
we must reduce it to a projection of codimension 1 by replacing the set of vari- 
ables (xj,,---,%i,,) by (t wm (y ),...,t%”)(y)) in E and replacing the series of 
quantifiers by 3 ay: 

(d) The proof is analogous to that of (c). Here we use the fact that the 
sets in Xo are Diophantine, and we observe that in general, 4---4 cannot be 
replaced by 4 in this case. 
The proposition is proved. 


6.3. Theorem. For alln > 1, 


ProoF. The assertion that ©\I, 4 @ is precisely Theorem 5.8 of Chapter V 
on the existence of undecidable enumerable sets. We prove the general case by 
an analogous diagonal process applied to a versal family. 

Let {£;,} be a versal family of enumerable (n + 1)-sets over Z*, and let E 
be its total space: 


(Ki, @o, 0+; Bn) EES (Z0,-++;2n) € Ex. 


To fix ideas, suppose n is even. We set 


F = {k\ha Vaq---Watn7((k,k,21,...,2n) CE)} CZ. 


By 6.2(c), we have F € %,,. Since {F£;} is versal, it follows by 6.2(c) that any 
subset of Z* in II, can be represented in the form 


Peg => {xo |— dx Vix9 Pe -Van-7((ko, Xo, D1y++. Bn) € E))} 


for some kg € Z*. It is clear that ko lies either in F\F;, or in Fy,\F. Hence 
F ¢ Fy, and F €D,\Mp. 
The other cases are handled analogously. 


6.4. Remarks 


(a) From the point of view of the theorems of Tarski and Godel, the results in 
6.2 and 6.3 show us the tremendous distance from provability to truth: D € %4, 
while T falls not only outside 4, but even outside 4. In the next section we 
indicate some mileposts along the way from D to T. 

(b) Although not really formally justified by the above considerations, 
nevertheless it makes sense to classify arithmetic problems, i.e., questions “Is 
it true that P € T?” according to the number of alternations between J and V 
when the closed formula P is written as in 6.2(c). 
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As we showed in §1 of Chapter I, the Fermat conjecture is expressed by a 
I,-formula, and the Riemann hypothesis is expressed by a II3-formula, 
although there is an assertion of type I], that is equivalent to the RH. 

H. Rogers writes that 


Almost all statements which (i) have been extensively studied by 
mathematicians and (ii) are known to be arithmetically expressible can 
be seen, from a relatively superficial examination, to have quite low 
level in the %,, classification. As has been occasionally remarked, the 
human mind seems limited in its ability to understand and visualize 
beyond four or five alternations of quantifier. Indeed, it can be argued 
that the inventions, subtheories, and central lemmas of various parts of 
mathematics are devices for assisting the mind in dealing with one or 
two additional alternations of quantifier. 


7 Productivity of Arithmetical Truth 


7.1. In this section we discuss a final feature of Gédel’s theorem: the possibility, 
starting from any enumerable set of truths of arithmetic that we already know, 
effectively to enlarge this set by adding new truths. To see this more clearly, 
we examine the original version of the proof, in which the diagonal method is 
explicit, rather than hidden in the construction of an undecidable enumerable 
set. It is convenient to describe this version by comparing it with the proof of 
Tarski’s theorem. 


7.2. Suppose we are given a language of arithmetic (Li Ar, SAr, or an extension 
of one of them). Further suppose that we have chosen a fixed numbering of 
its alphabet, which determines a fixed numbering N of the formulas. (It is 
essential to note that the construction that follows is not invariant if we replace 
our numbering by an equivalent one.) 

Both the Tarski and the Gédel arguments are based on the following “self- 
reference lemma”: 


7.3. Lemma. Given any formula P(x) in the language that has one free variable, 
we can effectively construct a closed formula Qp that says, “my number does 
not belong to the set defined by P.” In other words, Qp is true if and only if 
P(N(Qp)) is false, where N(Qp) is the term-name for N(Qp). 


Proor. This lemma was proved for SAr in $11 of Chapter I. In L,Ar we 
construct the formula Qp as follows. 

If R(z) is a formula with one free variable, we call the formula R(N(R(z))) 
its diagonalization. Let diag : Z* — Zt be the partial function 


the N — number of a formula with one free variable 


++ the N-number of its diagonalization. 


It is easy to show, using the results and methods in §4, that diag is 
computable. Thus, its graph is definable by a formula in L,Ar that can be 
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' 


explicitly constructed. We denote this formula by “y = diag x,” construct the 


formula R(x) : dy(“y = diag x«”AP(y)), and finally set 


Q,: —=R(N(-R(2))) = the diagonalization of —R(z). 


By the definitions, we then have 


Q,» is true = the number of =R(z) does not satisfy R(x) 
<= the number of diagonalization of — R(x) 
does not satisfy P(x) 
< the number of @, does not satisfy P(x). 


The lemma is proved. 


We note that it requires a large amount of technical work to verify 
that “y =diag x” is definable in L,Ar, which is why we used SAr instead in 
Chapter IT. 


7.4. The arguments of Tarski and Gédel now take the following parallel form: 
Tarski: 


(a) Suppose that truth is definable by a formula P. 

(b) Then there is a formula Qp that says “I am not true.” 

(c) The formula Qp cannot be false (because of its semantics). 
(d) The formula Qp cannot be true (because of its semantics). 
(e) Therefore, truth is not definable. 


Godel: 


(a) Provability is definable by a formula P. 

(b) There is a formula Qp that says “I am not provable.” 

(c) The formula Qp cannot be false (because of its semantics, since otherwise 
it would be provable, and hence true). 

(d) Therefore, Qp is true. 

(e) Therefore, Qp is not provable (because of its semantics). 


We note that in the above paraphrasing of Gédel’s argument, part (c) 
explicitly uses the stipulation that only true formulas are provable. When 
Godel’s paper appeared in 1931, specialists were very busy looking for finitistic 
proofs that the axioms of arithmetic are consistent, so that stipulating that 
D CT would have run counter to the spirit of the times. Therefore, in Gédel’s 
own original wording the argument looks somewhat different. This distinction 
is traditionally explained in great detail in all textbooks on logic. However, 
we shall be satisfied with remarking that if D ¢ T, then D # T, and the 
incompleteness theorem is trivially true. But in that case we would be in such 
bad shape that we would no longer care about completeness or incompleteness. 

The main point we are interested in is the following: given any fixed con- 
ception of provability that leads to an enumerable (or even to an arithmetical) 
set D of provable true formulas, we can effectively construct a new formula 
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that is true but not provable. We now define more precisely what we mean by 
“effectively.” 


7.5. Definition. A set F C Z* is said to be productive relative to a versal 
family {£,} of 1-sets if there exists a partial recursive function f such that for 
allk € Z* with E;, Cc F, we have k € D(f) and f(k) € F\Eg. 


7.6. Proposition. Under the conditions in 7.2, the set of numbers of true 
formulas is productive relative to the versal family {E;,} constructed in §8 of 
Chapter VI. 


PROOF. To fix ideas, we shall work with the language L;Ar. We first con- 
struct an enumerable family {P,(x71)} of formulas with one free variable x; such 
that P, defines E;. To do this, we define a sequence of terms f [k] in Lj Ar as 
in 8.1(a) of §8, Chapter VI, by setting 


-), k times; 
= t= itis - + 1)st variable in Lj Ar; 
_ f[t2(k)]); 


We then write 


Py = Sera(Sarg +++ Gere (Flta(&)] = Flta()))) +). 


It is easy to see, using the methods in §4, that the function k + N(P,) is 
recursive. We next fix a translation of “y = diag x” and set 


Re = egy (“vey = diag x1”) A (Pr(a1))), 
Qp, = “(Re(N(-(Re)))), 
and finally 
f(k) = N(Qp,). 


This function is computable because N(P;,) is computable. By Lemma 7.3, it 
satisfies the condition 7.5 with T in place of F’. 


7.7. The concept of productivity gives us the following approach to the problem 
of exhausting T: we begin with the set Do of formulas that are provable in 
the Peano axiom system Axo, we define Do by a formula Po; we set Ax; = 
Axo U {Qp,}; and we similarly construct D,, P;, and Axg = Ax; U{Qp,}, and 
so on. It follows from Gédel’s theorem that as long as we do all this “uni- 
formly effectively,” we cannot obtain all of T even after transfinitely many steps. 
However, S. Feferman has shown that if we are willing to dispense with ef 
fectiveness, we can obtain all of TAr in this way. We conclude this section 
by formulating Feferman’s result, which gives unexpected and philosophically 
interesting information about TAr. We omit the proof and the technical details 
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(see Feferman’s original article “Transfinite recursive progressions of axiomatic 
theories,” J. Symb. Logic 27, no. 3 (1962), 259-316). 


7.8. Principles of extension. In the first place, in order to exhaust TAr it is not 
enough to add Gédel’s formula to Ax; at every step. There are many other ways 
of constructing intuitively true formulas that in various ways formalize “having 
faith in the axioms Ax;.” 

Feferman, in particular, uses the following construction. Suppose that 
we have already constructed the axiom system Ax, (where a is an ordinal), 
and that the set of numbers of formulas deducible from Ax, is defined by 
the formula D,. For any formula P(x) with one free variable, we construct a 
formula BP that has the intuitive meaning “if P(n) is provable (from Axq) for 
all term-names 7 of natural numbers, then Vx P(z) is true.” These formulas BP 
must lie in T, and we can set 


Axo41 = AXq U{BP all P}; 
Axg = U Ax,, if @ is a limit ordinal. 


a<pB 
Here is a method for giving B? explicitly. The function n + N(P(n)) is 
computable as a function of n and N(P). We define its graph by a formula 
M(a,y, 2), so that for l,m,n € Z*, 


lis the number of a formula P with one free 


M(I,m,n) is true 7 
variable x,and m is the number of P(i). 


We then set 
BP = Vy Vz(M(N(P),y,z) > Daly)) > Vx P(a). 


7.9. The problem of choosing D,. This is the subtlest part of the proof. Here it 
is crucial to show that Dg exists when (3 is a limit ordinal. 

Feferman shows how the Dy a can be constructed for a suitable countable 
sequence of ordinals with limit y not exceeding wo”°° so that the following 
result will be true. 


7.10. Theorem. All true formulas in LAr are deducible from UgeyAXa - 
Thus, suppose we have accepted the Peano axioms. Then, in order to attain 


the total truth in arithmetic, we must perform a transfinite sequence of acts of 
faith in our not having been led astray by the previous acts of faith. 


8 On the Length of Proofs 


8.1. The title of this section is taken from a short paper written by Gédel 
in 1936. His article consists of a precise formulation and proof of the following 
qualitative assertions. 
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Suppose we are given a formal language L together with some conception 
of deducibility of a formula P from a (variable) set of formulas a . Suppose, in 
addition, that we are actually given a function that estimates the “complexity of 
deduction” of a formula P from the set a. (In languages of £1, this “complexity” 
could be the minimal size of a deduction of P from a, i.e., the number of signs 
of a fixed finite protoalphabet needed for such a deduction; note that the use of 
the word “complexity” here has nothing to do with the Kolmogorov complexity 
in §9 of Chapter VI.) We further assume that L contains a certain fragment of 
the logic of £1, that D and a are rich enough for the incompleteness principles 
to take effect, and that the “complexity of deduction” satisfies certain natural 
axioms. We then have the following facts: 


(a) There exist formulas deducible from a whose deduction is arbitrarily more 
complex than the formula itself. 


Observation shows that this somewhat vaguely defined class includes, if 
not the most important, at least the most “prized” mathematical facts. 


(b) If we add any independent formula A to the axioms a, then we can find 
formulas deducible from a whose deduction from aU {A} is arbitrarily less 
complex than from a (the principle of cutting down proofs). 


Compare with the great strength of “analytic” methods in comparison with 
“elementary “” methods in number theory. 

The following more precise presentation of these ideas is based on a short 
article by Ehrenfeucht and Mycielski in Bull. Amer. Math. Soc. 17, No. 3 (1971), 
366-367. 


8.2. We consider the following set of data. 


(a) A countable alphabet A with a fixed numbering N: A— Zt. 

(b) A subset F' Cc S(A) whose elements are called formulas. 

(c) A partial function D : P(F’) — P(F) that to certain subsets a C F' asso- 
ciates sets D(a) of formulas “deducible from a.” We shall often write ab P 
instead of P € D(a). 

(d) The complexity of deduction: this is a function Cdg(P) that is defined for 
pairs D C F,P € D(a), and takes values in Z*. It is convenient to take 
Cd,(P) = 00 if P ¢ D(a). 


We impose the following conditions on this data: 


8.3. (a) A contains a, —, (, and). 

(b) If P and Q € F, then -(P) and (P) — (Q) € F. As usual, we shall write 
P = Q instead of (P) > (Q), and so on. 

(c:)a C D(a); if a C a and D is defined at a, then D is defined 
at a'and D(a) c D(a’). 

(co) IfaU{P} FQ, thnat PQ. 

(cz) ak P > (AP > Q) for any P,Q e F. 

(do) Ifaca,, then Cdy,(P) < Cda, (P). 

(di) The set {(P,n)|Cdg(P) < n} C S(A) x Zt is decidable. 
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Condition (d;) does not actually have to hold for all a C F' , but we shall 
consider only those a for which it is true. In the case that Cd,(P) is the size 
of the shortest £,-deduction of P from a in a finite protoalphabet, and a is a 
decidable set of axioms, (d;) holds for the following reason. We can write down 
all the texts in A having size < n—there are a finite number of them—and then 
verify for each one in turn whether it is a deduction of P from a. 

(dg) There exists a general recursive junction f(x,y,z) that is 
nondecreasing in « such that 


Cdaup}(Q) < f(Cda(P > Q), N(P),N(Q)) 


for all Q € D(a). 

Both sides of this inequality are finite because of the previous conditions: 
since al Q, it follows by (c1) that aU{P}F Q, and then by (c2) that ab P > Q. 
We have an estimate of the type in (dz) in languages of £1, because, starting 
with any deduction of P — Q from a, we can obtain a deduction of Q from 
aU {P} by simply adding P and Q (by modus ponens). This increases the size 
of the deduction of P — Q by the sizes of P and Q. 

(ds) There exists a general recursive function g(x,y) such that 


Cd,(P > (-P > Q)) < g(N(P), N(Q)). 


In languages of £;, the formula P — (=P — Q) is a logical axiom, and if a 
contains this axiom, then the deduction has length 1 and size equal to the size 
of the formula itself. Of course, the size of this formula can be represented in 
the form g(N(P), N(Q)). 

We now formulate Gédel’s theorem on “cutting down proofs.” We suppose 
that the conditions and conventions in 8.2—8.3 are fulfilled. 


8.4. Theorem. 


(a) Suppose that a C F and D(a) is undecidable. Then for any general 
recursive function I there exist infinitely many formulas P € D(a)) such 
that 

Cd,(P) > I(N(P)). 


(b) Suppose that a’ = aU{A} and the formula A has the property that D(aU 
{=A}) is undecidable. Then for any general recursive function r there exist 
infinitely many formulas P € D(a) such that 


Cda(P) > r(Cda(P)). 


PROOF. 


(a) If the first assertion were false, then for a suitable / and for all P € D(a) 
we would have Cd,(P) < I(N(P)). But then the set 


D(a) = {P|Cdr(P) < UN(P))} C S(A) 


would be decidable by (dj), since it is obtained by applying a bounded universal 
quantifier (in n) to the decidable set in (d1). This contradicts the assumption. 
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(b) Let P € D(aU {7A}). By (dz) we have 
Cdauysa}(P) < f(Cda(>A — P), N(>A), N(P)). 


If we now suppose that the second assertion of the theorem were false, then for 
a suitable nondecreasing general recursive function r we would obtain: 


(Cd,(7A > P) < r(Cdy (7A > P)), 
or, by (dg) and (ds), 
(Cdg(7A > P) <r of(Cdg(A > (“A > P)), N(A), N(P)) 
<r of(g(N(A), N(P)), N(A), N(P)). 


Substituting this in the above inequality for Cdaus—}(P), for fixed A we obtain 
an estimate of the form 


Cdaugsay(P) < UN(P)), 


where / is general recursive and P € D(a U {7A}). But this contradicts the 
assumption that D(a U {=A}) is undecidable by the first assertion of the 
theorem. 


VIII 


Recursive Groups 


1 Basic Result and Its Corollaries 


1.1. We consider a countable “group alphabet” 
A= dis a9 Oy Ge gies} 


The expressions in the alphabet A, including the empty expression @, are 
traditionally called words. The word a;---a; (m > 1 times) will be written 
a’; the word a;'---a;' (m > 1 times) will be written a7"; and we agree to 
take a? = @. We call a word aj" ---a;"" reduced if either it is empty or there are 
no subwords of the form a; aj or aia; * when it is written in expanded form. 

The operation of “joining and reducing” (by “reducing” we mean crossing 
out all subwords of the form aia; or a; ai) defines a group structure with 
unit @ (which we sometimes denote by 1) on the set of reduced words. This is 
a free group F’ with a countable set of generators {a1,...,@n,...}. We can also 
consider nonreduced words as elements in F’: we identify such a word with the 
word obtained by reducing it. 

We have a canonical numbering on A: N(a;) = 2i,N(a;') = 2i—1. All 
properties related to the computability of operations and the enumerability of 
subsets in A and $(A) will be considered relative to any numbering of A equiv- 
alent to N and any numbering of S(A) compatible with N (see the definitions 
in §1 of Chapter VII). We shall continually be making use of the following facts. 


1.2. Lemma. 


(a) The set F of reduced words is decidable. 

(b) The group operations in F are computable. 

(c) A subgroup G C F in enwmerable in S(A) if and only if it has an enumerable 
set of generators. 

(d) A normal subgroup H C G in an enumerable subgroup G C F' is enumerable 
if and only if it is generated as a normal subgroup by an enumerable set. 

(e) A homomorphism FF is recursive if and only if the induced map {ay,..., 
An,-.-} > F is recursive. 
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The proof is a good exercise in using the techniques of Chapter VII, and we 
leave it to the reader. It is convenient to begin by showing that the operation 
of reducing is computable; the rest goes through more or less automatically. 


1.3. Definition. A group is called recursive if it is isomorphic to a quotient 
group of the form G/H, where G C F is an enumerable subgroup and H C G 
is an enumerable normal subgroup. 


Here we could limit ourselves to subgroups G C F that are generated by an 
enumerable subset of the standard generators {a1,...,@n,...}. 


1.4. REMARKS AND EXAMPLES. 


(a) Recursive groups have at most countably many elements. 

(b) Finitely presented (f.p.) groups, i.e., those that have a finite number of 
generators and relations, are recursive. In particular, finite groups and 
finitely generated (f.g.) abelian groups are recursive. 

(c) A subgroup H of an f.p. group G is not necessarily f-p. (or even f.g.). But 
if it is finitely generated, then it is recursive. 


In fact, let {hi,...,4m} be generators of H. We add_ generators 
{hm4i,---,An} of the group G that are connected by a finite number of 
relations, and we define a homomorphism ¢ : F' — G by setting ¢(a;) = hj 
ifi <nand d(a;) =1 if i >n. The kernel F of ¢ is generated by a finite num- 
ber of relations between a1,...,@p, and by the set {@n41, @n+2,.-.}. Hence E is 
enumerable by Lemma 1.2(d). The subgroup H Cc F generated by aj,...,@m 
is also enumerable, by Lemma 1.2(c). Therefore the set HM E is enumerable. 
But ¢ induces an isomorphism H/ HM EH. Consequently, H is recursive. 


The basic aim of this chapter is to prove the following remarkable theorem 
of Higman, which gives the converse of the simple assertion 1.4(c). (G. Higman, 
Subgroups of finitely presented groups, Proc. Royal Soc., Ser. A, vol. 262 (1961), 
455-475.) 


1.5. Theorem. 


(a) Any recursive group G/H (in the notation of 1.3) can be embedded in a 
suitable f.p. group F/N. 

(b) This embedding can be made effective, i.e., it can be induced by a suitable 
recursive map G — F. 


Here are some applications of this theorem. 


1.6. Corollary (Universal finitely presented groups). There exists an f.p. group 
U such that any f.p. group G can be embedded in U (and hence, any recursive 
group can be embedded in U). 


In fact, any f.p. group is isomorphic to the quotient of F by a normal 
subgroup that is generated by a finite set of reduced words in F and by all a; 
with i > n for some n. We let I Cc S(S(A)) x Zt be the decidable set of pairs 
(a finite sequence of reduced words, n), and we let N; (for i € I) denote the cor- 
responding normal subgroup. We construct the “doubly infinite” group alphabet 
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{a;k, ax | j,k > 1}, we identify I with Z* by choosing a recursive numbering 
of I, and we define the group Up that has generators {a;,} and relations “N,, 
written in the alphabet {aj1,a;2,...}.” It is clear that Up is recursive. It will 
also be clear from the results in the next section that Uo is the free product of 
all the groups F’/N;, so that any fp. group can be embedded in Up. Thus, any 
f.p. group U in which we can embed Up, using Higman’s theorem, is universal. 


In M.K. Valiyev’s article Examples of universal finitely presented groups, 
Dok. AN SSSR, 1973, vol. 211, no. 2, a universal group U is constructed that 
has 14 generators and 42 relations, and it is mentioned that such a group can 
be constructed with only 2 generators and 27 relations. 


1.7. F.p. groups with algorithmically undecidable word problem. 
Let G be the group with four generators a,b, c,d, and with the relations 


b-™ab™ =d-"cd™, for allme E, 


where FE Cc Zt is an undecidable enumerable set. It easily follows from the 
results in §2 that the equation 


b-*ab* = d “cd” 


holds in G only if « € E. (In fact, the elements b~™ab™ for m > 1 generate 
a free subgroup of G, so that G contains the free product of the subgroups 
generated by {b-*ab*|a > 1} and by {d-*cd*|x > 1} with amalgamation 
{b-*ab* = d~*cd*|x € E}.) Hence, the question whether the equation b-*ab” = 
d-*cd* holds is undecidable (as a mass problem indexed by x), and if we 
embed G effectively in an f.p. group, we may conclude that the word prob- 
lem is unsolvable in this f.p. group. 

The existence of such groups was first established by P.S. Novikov and 
W. Boone. 


1.8. “Natural” recursive groups. In algebraic geometry over algebraic number 
fields, we find many examples of recursive groups that are not a priori finitely 
presented. We shall limit ourselves to one typical example. 

Let O,(Q) be the orthogonal group of automorphisms of an n-dimensional 
linear space L (over the rational numbers Q) together with a quadratic form f. 
Let b be the corresponding bilinear form. The symmetry T, € O,(Q) is defined 
for any vector « € L with f(x) 40: 


for all y € L. The involutions tT, € O,(Q) give us an enumerable system of 
generators of O,,(Q), and all the relations are generated by the enumerable 
(indeed, decidable) system of relations 


2 
7 =1, 


(TeTyTz)* =1, for all coplanar {z, y, 2} 


(S. Becken). 
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The numbering of L ~ Q” implicit here is taken to be compatible with any 
numbering of Q that is compatible with the standard numbering of Z* and in 
which the field operations are computable. 


1.9. Higman’s theorem is related to the theorem that enumerable sets are 
Diophantine (Chapter VI), although it was first proved earlier than the lat- 
ter result. Perhaps both facts are special cases of some general assertion about 
recursive algebraic structures. 

In any case, the theorem on the Diophantine nature of enumerable sets 
can be used to simplify considerably the recursion-theoretic part of Higman’s 
proof. This was shown by Valiyev, whose construction will be given in §85-6 
(cf. Algebra i Logika, vol. 7, No. 3 (1968)). §§2—4 will be devoted to the group- 
theoretic preliminaries; here we shall follow Higman. 


2 Free Products and HNN-Extensions 


2.1. Suppose we are given a family of groups (G;),i € J, and a family of 
group homomorphisms a; : A—G;. We consider the class of families (H, 3;) of 
homomorphisms (3; : G; — H such that 6; 0 a; : A—H does not depend on 
i € TI. This class contains a universal family 6; : G; — *4Gp that is unique up 
to isomorphism: any other family (H, 3;) uniquely determines and is uniquely 
determined by the homomorphism y : «4G, — H for which }; = yo ¢j. 

In what follows we shall need only the case in which all the a; are em- 
beddings. In this case *4G, is called the free product of the groups G; with 
amalgamated subgroups a;(A) C G;. We shall generally denote the structure 
maps G; — *4G, by ¢;, perhaps with additional indices. We let ¢ denote the 
structure homomorphism ¢; 0 a; : A — *4Gz, which does not depend on i. If 
A = {1}, we write simply *G; instead of «4G; if the set of indices is {1,...,n}, 
we write G, *---* G,, and so on. We shall continually be making use of the 
following structure lemma. 

Let a; : A > G; be embeddings, and let S$; C G;, be subsets such that 


Gi\ai(A) = Uy a;(A)s, and 
seS; 
aj(A)s; 4 aj(A)se, for 5; 4 59 € Sj. 


2.2. Proposition. Any element in the group *4G; can be uniquely represented 
in the form 


(4) bis (81) +++ Din (Sn), 
where a € A, sz € Si,,4; A tj41 for all j, and n > 0 depends on the element. 


We shall call this the canonical expansion of an element. 
For the proof of this fact and for further details, see, for example, Serre’s 
lecture notes Arbres, amalgames et SLz. 


2 Free Products and HNN-Extensions 267 


2.3. Corollaries 


(a) Under the conditions in 2.2, the structure homomorphisms 6 and ¢; are 
embeddings. 


This allows us to identify A and G; with subgroups of «4G; using ¢ and ¢j. 
We shall do this in the statements that follow. However, in the several- 
step constructions in the later subsections, one and the same group will be 
embedded in another group in many different ways using various compositions 
of the structure maps, and it will be necessary to keep careful track of these 
embeddings. 


(b) G;NG; = Alin*aGi) fori¥j. 


In other words, ¢;(G;) 9 ¢;(G;) = @(A). We can use Proposition 2.2 to 
prove C: otherwise we would have ¢;(s;) = ¢;(s;), which would contradict the 
uniqueness. 


(c) Suppose we are given a family of embeddings 3; : H; > G; and a subgroup 
BCA such that 3;(Hi) N.a;(A) = a;(B) for alli. Then the composition 
OB: 


B a — A; pests * AG; 
does not depend on i, and therefore gives a canonical map *pH; — *4Gj. 
This map is an embedding. In particular, the subgroup of *4G; generated 


by 6; ° 3;(H;) is isomorphic to *pH;. 


In fact, the canonical expansion in 2.2 of an element in «gH; goes to the 
canonical expansion of the image of this element in *4G;j. 


(d) With the same notation, we have 


( *B Hi) NA=B inx,aG;; 
(+p Ai) NG; = Hi; in *4 Gj. 


2.4. Generators and relations. Let M be a set, and let R be a subset of the free 

group Fy, that is freely generated by M. We let |M : R| denote the quotient 

group Fy,/R, where R is the smallest normal subgroup of Fyy containing R. 

This is what we mean by defining a group by generators (V/) and relations (R). 
We shall take the following liberties with notation: 

(a) If M has a nonempty intersection with a group that has already been 
defined, then all relations coming from the relations in the earlier group 
are assumed to be included in R, even if they are not explicitly written out. 
We might completely omit any reference to R if there are no other relations 
besides those coming from the earlier group. For example, if EF and F CG 
are two subgroups, then |E U F| is the subgroup they generate in G, and 
so on. 

(b) Instead of writing, say, a,a3‘ is in R, we may write a, = ay. 


268 VIII Recursive Groups 


EXAMPLE. If the a; : A — G; are embeddings, then «4G; is defined by the 
following generators and relations: 


U Gi: a;(a@) = a;(a) for all a € A,i,j € I]. 
iel 


We now introduce a construction that will be fundamental for everything 
that follows (G. Higman, B. Neumann, H. Neumann). 
Suppose we are given two embeddings of groups a, 3: A > G. 


2.5. Definition. The HNN-extension of the group G (relative to A, a, 3) is the 


group 
K =|GU {t}: t7'a(a)t = B(a) for all ae Al. 


2.6. Proposition. The following homomorphisms are embeddings: 


(a) G— K: q+ the class of g modulo the relations in K. 
(b) Gx4t-'Gt — K, where the free product is taken relative to the embeddings 
at+ B(a) anda t-ta(adt. 


ProorF. In the group G'« {u”}, the subgroup U generated by G and u~!a(A)u 
is isomorphic to G * u~'a(A)u. In fact, the canonical expansion of an element 
in G* u-‘a(A)u has the form giu~'a(ai)uge--+ gnu ta(an)u, where gi € G, 
92,-+-59n © G\{1},a1,..-,@n-1 € A\{1},a, € A, and so this expansion also 
has the canonical form in G * {u"}. 

We construct the subgroup V= G *« vG(A)u—! C Gx {v"} similarly. 

We identify the group W= G * w-!Aw with U and V by means of the 
isomorphisms that are the identity on G and take w7'aw to u~ta(a)u and 
vB(a)v—", respectively. 

We now consider the group (G * {u"}) *«w (Gx {v"}). The group G C W is 
canonically embedded in it, and for all a € A the element t = uv satisfies the 
relation 


t*a(a)t = B(a), 


because we have made the identification u~la(a)u = vB(a)v—!. In addition, 
it is clear from Proposition 2.2 that in (G *« {u”}) *, (G * {uv"}) the groups 
u-! Gu and vGu~! generate a free product with amalgamation A embedded 
by means of the maps a +> u~ta(a)u and a+ vG(a)u', respectively. Hence, 
if we conjugate by v, we see that G and ¢t~!Gt also generate a free product, as 
described in the statement of 2.6. 

Therefore, the subgroup 


K =|GU{t=u"}| c (Gx {u"}) #w(G * {v"}) 


is a homomorphic image of K, and assertions (a) and (b) hold for K’. Moreover, 
the canonical map K — K is an isomorphism. To see this it suffices to note 
that there exists an isomorphism 


K « {v"} S (G  {u"}) *w(G * {v"}) 
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that takes t € Kk to wv. In particular, t has infinite order in K. The proposition 
is proved. 


We shall need to refine and generalize this result in two directions. In the 
first place, we want to consider iterated HNN-extensions; in the second place, 
we are interested in the connection between HNN-extensions of a group and a 
subgroup. We now bring together all the facts we need into a single statement. 

Suppose that we are given an entire family of pairs of embeddings 
aj, 8; : A; -~ G (@ € I) and a subgroup H C G with the property that 
az *(a;(Ai)N H) = B7'(Bi(Ai) 0H) = B; C A; are subgroups. Under these 
conditions we have the following result. 


2.7 Proposition. Let 


Ke= |Gu {tijie I}: ty taj(a)ti = B(a) for allie I,a€ Ai|; 
Ky =|HU {tli € I}: t;-1a4(b)t, = Bi(b) for all ie I,b € Bj. 


Then 


(a) the {t;} freely generate a free subgroup in Ke; 
(b) the natural maps G > Kg and Ky — Kg (the latter given by t; — t;) 
are embeddings. In addition, Ky AG=H in Ke. 


PROOF. 


(a) If the relations in Kg implied a nontrivial relation between the t;, this 
relation would be preserved in the quotient of Kg by the smallest normal divisor 
containing G. But in this quotient the relations t;'a;(a)t; = 3;(a) become 
trivial (1 = 1), and no restrictions are imposed on the images of the ¢;. This 
proves (a). 

(b) We first consider the case that I consists of one element. In the notation 
used in the proof of Proposition 2.6, we consider Kg as a subgroup of (G * 
{u"}) *w (Gx* {v"}). By Proposition 2.2, in G * {u”} we have 


Hx«{u™}NGeu'a(A)u= H*uta(B)u, 
and similarly, in G « {v”} we have 
H « {uv} NG «vB(A)u-| = Hx vB(B)vt. 


The above identifications of U and V with W identify these intersections with 
the subgroup 
Wo =Hxw 'BucGew |Aw=W. 


By Corollary 2.3(c), we have a canonical embedding 
(H * {u”}) *w, (H * {u"}) > (Gx {u"}) *w(G * {v"}). 


But as at the end of the proof of 2.6, the group on the left is Ky * {v"} and 
the group on the right is Kg « {v"}, so we obtain an embedding Ky — Kea. 
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Furthermore (the intersection is taken in (G * {u”}) *w(G * {v"})): 
(H « {u"}) *w(H * {v"}) NG *u-ta(A)u = Hx uta(B)u, 


so that if we now intersect with G, we obtain H. It follows a fortiori that 
KkyNG=HdH. 

We prove (b) for finite J by an easy induction on n, and then for infinite J 
by passing to the inductive limit (which here is a union). We leave the details 
to the reader. 


3 Embeddings in Groups with Two Generators 


In this section we prove a result that will be used later and that shows vividly 
in a simple situation how the number of generators can be decreased using em- 
beddings. 


3.1. Proposition. 


(a) Any countable or finite group G can be embedded in a group with two 
generators. 
(b) If G is recursive, then there is such an embedding that is recursive. 


PROOF. 
(a) The group Z * Z = {b”} * {v"} has a free subgroup of countable rank, for 
example, 

S = |{b-*vb'|i > 0} |. 


It immediately follows from Proposition 2.2 that there are no relations between 
the generators b~ ‘vb’. 

Thus, if G is a free countable group, it embeds in Z* Z. If G is not, we could 
try to represent G in the form F'/N, where F is countable and free, then embed 
F in Z * Z and consider the induced homomorphism F/N — Z * Z/N’, where 
N’ is the normal subgroup in Z*Z generated by N. Unfortunately, N' 0 F may 
be strictly larger than N, so that this homomorphism does not have to be an 
embedding. The following construction shows how to deal with this problem. 

Let {91, 92, 93,---.} be a countable system of generators of G', where g; # 1. 
We successively construct the following extensions of G: 


(1) Gx {u"}; 
(2) the HNN-extension of G * {u”}, 


IG «{u"bU {e,|t; ut; =ugi,i=1,2,... | 


(note that u and the ug; generate infinite cyclic subgroups in G * {u”}); 
(3) the free product P of this HNN-extension and the group {b”} « {v"} with 
subgroups |{ti, t2,...}| and |{b~*vb'|i > 1}| amalgamated by means of the 
isomorphism 
t; = b-*ud’, i>. 
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(4) P has the two rank-2 free subgroups |{b,v}| and |{u,b}|. There are no 
relations between u and b because there can be no relations in the quotient 
by the smallest normal subgroup containing G,t;, and v. 


Finally, we construct the following HNN-extension of P: 
Q=|PU{a}:a7'ba = u,a-‘va = Bl. 


To complete the proof, it remains to verify that Q is generated by the elements 
a and b. 

In fact, Q has the obvious system of generators {g;,t;(¢ > 1); u,v, a,b}. The 
relations g; = u—'t; ‘ut; allow us to eliminate the g;; the relations t; = b~‘vb! 
allow us to eliminate the t;; and the relations u = a~'ba and v = aba“! allow us 
to eliminate u and v. This proves the first part of the proposition. The following 
analysis of the construction establishes part (b). 

If we express g; in terms of a and b in Q using the above relations, we find 
that g; = e;, modulo the relations in Q, where 


e; =a 'b-lab~'ab~ta~tb'a7 1 bab~*abaq 1b". 


Hence, the subgroup EF = |{e;|¢ > 1}| in the group {a”}« {b"} has the following 
remarkable property: any normal subgroup N C E generates a normal subgroup 
N’ in {a"} * {b"} such that EM N = N (compare with the remark at the 
beginning of the proof). 

In particular, if {g;} is an enumerable system of generators of G that is 
connected by an enumerable set of relations, it follows that the map g; + e; 
(mod the relations) induces a recursive embedding of G in the recursive group 
E/N’, since N’ is enumerable whenever N is. 


4 Benign Subgroups 


4.1. Definition-Lemma. Let G be a finitely presented group, and let 
H CG be a subgroup. H is called benign if the following equivalent conditions 
are fulfilled: 


(a) There exist a finitely presented group K, a finitely generated subgroup 
LC K, and an embedding G C K such that GN L = H. 
(b) The HNN-extension 


Ke = |Gu {t}:t tht =h, for allh € H| 


can be embedded in a finitely presented group. 
(c) Gx G can be embedded in a finitely presented group. 


PROOF OF THE EQUIVALENCE 


(a) = (b). Suppose that G C K and L satisfy (a). Then it follows by 2.6 that 
Ke is embedded in the HNN-extension 


|K U{é}:¢7+t =1, for alll € LI. 
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This group is finitely presented: we add ¢ to the generators of K, and add the 
relations t~11;t = l;, for a finite system of generators {1;} of L, to the relations 
between the generators of K. 


(b) = (c). The group G *q G is embedded in Kg by 2.6(b), and Kg can be 
embedded in an f.p. group because we have assumed condition (b). 


(c)=> (a). Suppose that G «4G C M, where M is finitely presented. We set 
K = M, we set L = the image of G under the composite embedding ¢2 : G > 
G *yG— M, and we embed G in K by means of ¢, : G—- Gey G > M. 
Since ¢1(G) N ¢2(G) = H, we have GN L = F in K, as required. 


The basic goal of this section is to reduce Higman’s theorem 1.5 to proving 
that all enumerable subgroups in Z * Z are benign. For this purpose and for 
later uses we shall need the following lemma. 


4.2. Lemma. Let R be a benign subgroup of an f.g. free group F, and let R be 
the normal subgroup it generates. Then F/R can be embedded in an f.p. group. 


PRooF. Let 7 be an embedding of F’' «rz F in an f.p. group K (see 4.1(c)), and 
let @1,¢2: F — F «pF be the structure maps. We consider two embeddings of 
Fin K x F/R: 


a: fr (io dr(f), FR); 
B: fr (40 ba(f), 1). 


They obviously coincide on the subgroup R C F’.. Hence they are induced by a 
homomorphism 
y: Fan F— K x F/R, 


which has a trivial kernel, since the composition of y with the projection onto 
K coincides with 7. 

We construct an HNN-extension that takes i x {1}: Fer F >~ Kx F/R 
to 7: 


L=|K x F/RU{t}:t-*(i0 bi(f), 1)t = (io oi(f), FR), 
t~*(i 0 bo(f), 1)t = (i 0 d2(f),1) for all f € FI. 


L obviously contains F/R. We show that L is finitely presented. 

Generators of L : {t}U finite system of generators of K U finite system of 
generators of F’. This system is finite. 

Relations in L : 


(a) {the relations between the generators of kK}. 
(b) {the commutation relations between the generators of K and the generators 
of F}. 


After imposing these relations, we may consider that we are working in 
Kx F. 
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(c) tlio oi(f),1)t = (io di(f), f), 
t(io o2(f), 1)t — (i 2 b2(f), L) 


where f runs through the system of generators of F’. 
(d) The relations in R between the generators of F’. 


We can take the system of relations Ro = (a) U (b) U(c) to be finite. We 
need only verify that the relations in (d) follow from Ro. 

Let RC F be the normal subgroup generated by Ro, i.e., the kernel of the 
natural homomorphism F — |K UF U{t} : Ro|. We want to show that R’ = R. 
The inclusion R’ C R is obvious. We verify the converse. 

If f € F, we set f = f mod R and fi. = io d12(f) € K. It then follows 
from the relations (b) and (c) that in K x F/R’ we have 


tf, lt= (fil, f) and t1(fo,1)t = (fa,1). 


On the other hand, if f € R, then, since F' *p F is embedded in K, it follows 
from the relations (a) that fi = fo. Hence f =1,sothat RC R’. 


This lemma gives us the following reduction step. 


4.3. Proposition. Jf all enuwmerable subgroups in Z* Z are benign, then 
Higman’s theorem is true. 


ProoF. Let G be the free group generated by an enumerable set of free gener- 
ators {g;},i = 1,2,3,..., and let N C G be an enumerable normal subgroup. 
We shall show how to embed G/N into an f.p. group. 

We first consider the embedding G — {a”} * {b"} given by g; + e;, where 
the e; are as defined at the end of §3. Let the image of N under this embedding 
generate the normal subgroup N’ c {a”} * {b"}. By the remark at the end 
of §3, G/N embeds in {a"} * {b"}/N’. But N’ is enumerable by Lemma 1.2(d), 
since it is generated by the image of an enumerable set under a recursive 
map. Therefore, N ‘isa benign normal subgroup. Lemma 4.2 then shows that 
{a} * {b"}/N’ can be embedded in an f.p. group. 


We conclude this section by establishing several basic properties of benign 
subgroups. 


4.4, Lemma. Let E,F CG be benign subgroups of G. Then: 


(a) ENF is a benign subgroup; 
(b) |EUF| (“the sum of E and F inG ) is a benign subgroup. 


PROOF. Let ¢1,¢2 : G > GxgG and b1,¢5 :G—> GxrG be the structure 
homomorphisms. Let /, and M2 be f.p. groups such that G*—_G C Mj), and 
GxpG C Mp. We identify ¢1(G) C M; and 61(G) C Mp2 with G, and construct 
the group M, xq Mg. This group is finitely presented (since it suffices to add to 
the relations in M,, and Mp the relations ¢1(g;) = ¢;(g:) for a finite system of 
generators of G). Let db, by >: My), Mz — My*G@ Mz be the structure embeddings. 
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We set K = M, *«q Mo and L = o © @2(G), and we embed G in K by means 
of dy 0 dy. : 

We claim that GN L = EMF (as a subgroup of G in Kx). In fact, @; (Mi) 
5 (Mz) = G with its canonical embedding in M; *q Mb. If we take only ¢2(G) 
in M, and ¢,(G) in Mo, then intersecting with the amalgamation G gives E 
and F, respectively, and intersecting 2(G) with ¢,(G) gives EN F. 

(b) The subgroups ¢1(|E U F|) and ¢2(G) have the same intersection with 
the amalgamation in G *m G, since they actually contain it. Hence, by 2.3(d), 
we have |¢1(|E U F|) U d2(G)|N ¢1(G) = |E UF| in G*gG, ie., since E is the 
amalgamation, 


|d1(F) U $2(G)| 9 b1(G) = |EU PI. 


Similarly, we have 
|$1(E) U 6,(G)|N ¢(G) =|EUF| 


in Gx*prG. The notation is compatible with the fact that these two intersec- 
tions are identified in the amalgamation of the product M, *g@ M2, which is 
constructed as in part (a). 

Applying 2.3(d) to this product, we find that 


1 (|d1(F) U b2(G)]) U ds (lor (Z) U 42(G))) |G = |BU FI. 


But the group |¢, o¢2(G) U 5 0¢5(G)| A G obviously contains the right-hand 
side and is contained in the left-hand side of this equality, so that it also coin- 
cides with |EU F. 

Finally, |¢, 0 ¢2(G) U 45 © ¢9(G)| is a finitely generated subgroup of the 
finitely presented group M, *G@M2. The proof is complete. 


4.5. Lemma. Let G and H be f.g. subgroups of f.p. groups. Then any homo- 
morphism from G to H takes benign subgroups of G to benign subgroups of H. 


Proof. 


(a) If A C G is benign, then A x {1} C Gx H is also benign, since, given 
an embedding of (G, A) in (K,Z) as in 4.1(a), we can construct the obvious 
embedding of (Gx H) in (K x M, Lx {1}), where M is the f.p. group containing 
H, which also satisfies the conditions in 4.1(a). Conversely, if Ax {1} C Gx H 
is benign, then from an embedding of (G x H,A x {1}) in (K, L) as in 4.1(a) 
we construct the corresponding embedding of (G, A) in (K, LNG x {1}). 

(b) Now let 6: G — H be any homomorphism, let F' be its graph, and let 
ACG be a benign subgroup. Then in G x H we have 


{1} x (A) = |(|A x {1} U {1} x HIN F) UG x {1}| {1} x A. 


It is clear from the assumptions regarding G and H that F is a benign subgroup 
in G x H. By part (a), the other subgroups on the right in the formula are also 
benign. By Lemma 4.4, {1} x (A) is a benign subgroup. Hence, ¢(A) is also 
benign. 
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5 Bounded Systems of Generators 


5.1. Let G = l{a1,..-,@n}|,n > 1, be the group freely generated by the a;. 
We calla subset R CG bounded if there exists an r > 1 such that any element 
in R can be represented in the form aj" ---a;", 2; € Z. In this section we prove 


the following special case of the hypothesis of Proposition 4.3: 


5.2. Proposition. If the subgroup H’ C G’ is generated by a bounded enumer- 
able subset R' CG’, then it is benign. 


Corollary. The same is true if G is an fg. subgroup of an f.p. group (using 
Lemma 4.5). 


In the next section we show how the general case follows from this special 
case. 
The proof of 5.2 consists of a series of reduction steps. 


5.3. First reduction. In the free group G = |{a1,b1,¢13-.-3 Grn; brn, Crn }| we 
shall consider a set of “layered” words of the form 


x Hi x x 
R= {ap bicep) ate" dpa cern 


and the subgroup H C G it generates. We shall later show that if R is enwmer- 
able, then H is benign. This is a special case of 5.2 to which the general case 
reduces using the following technique. 

Suppose we are given G’ and R’ as in 5.1. For each element g! = aj'--- a7" € R’ 
we construct an element g € G as follows. We represent g’ in the form 


nm nm nm 
Li ©2,4 Dri 
[Ler [Le*--- Te, 
i=1 i=l i=l 
where 
Lk, for 7 = ig, 
ki : 
0, for i A ix 
We then set 


I= ( I a ne) ( I a, On fe at.) 
_ (I CVA a Cae + is a : 


i=l 


If R’ is enumerable, then the set R of all elements g obtained from all the 
g € R’ is enumerable. 

We consider the surjective homomorphism ¢ : G > G’ given by @(@nj +i) = 
a(l<i<cn0<j <r-—1), ¢(b;)) = o(c;) = 1 for alli = 1,...,rn. Clearly 
o(R) = R’, and hence ¢(H) = H . It then follows from Lemma 4.5 that if R is 
benign in G, then R’ is benign in G’. 
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5.4. Using the theorem that all enumerable sets are Diophantine. 
From this point on, we fix a pair (G, enumerable R), as in 5.3. We shall 
write | > 1 in place of rn. We define the set E Cc Z'+! by the condition 


R= {ap hoch? «> apt biel |g, .+< a1) € BE}. 


It is not hard to see that R is enumerable if and only if FE is enumerable. 
We now show that FE can be represented as the projection onto the first |+1 
coordinates of a set 


N 
()EscZ* xz", mel+2, 


s=1 


where each of the E, is defined by an equation of one of the following forms: 


Lt; =, ce Z; 

te Sy 0<tj7<m,; 

Le = tj +i, l+1<k<jg<icm; 
Lp Soy Li, l+1<k<j<icm. 


In fact, let €o,...,€, € {1,—1}, and let = = (€0,...,€1). We consider the 
enumerable sets 


EP = { (xo, has XL) €E (Zt U {0})'*" (cox, at .,€121) = E}. 


By the fundamental theorem in Chapter VI, there exist polynomials P® with 
integral coefficients such that 


E® = the projection of the 0-level of P® in (Z* U {0})!*? 
x (Z+)"~! onto the first 1 + 1 coordinates (x,..., 21). 


Here we can take n large enough that the sets of variables that actually occur 


in P€ and in P® and that “drop out” in the projection do not intersect if 
é #é . If weadd the (n+1)2!+ new variables y;j;-(0 <i <n,j = 1,2,3,4) to 
the variables that drop out in the projection, we find that E can be represented 
as the projection onto the first / + 1 coordinates of the 0-level of the following 
polynomial, where the 0-level is now in Z+} x Zrt(n+1)2'7°—1, 


o-[] 
. I 4 2 n 4 2 
+ > («xe = Ye) + DS («, == oe vee) | ; 
i=0 j=l j=l 


5 2 
(P* (e020, wos CULE, BULA a ia) 


i=I+41 


Finally, in order to represent the set Q = 0 as a projection of an intersection 
(Ls E, of the required type, we introduce additional variables as follows. Let 
Zo,-.--, 24 be the variables that occur in Q. Instead of Q = 0 we write Q; = Qa, 
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where Q, is the sum of the monomials in Q with positive coefficients, and Q2 
is the sum of the monomials with negative coefficients. Then 


0-level of Q = a projection of (w¢41 = Qi) OM (a1 42 = Qa) N (a@441 = @t42). 


If Q; and Q2 are constants or variables, this gives us the desired representa- 
tion. Otherwise, we write, say, Q; in the form Q, + Q, or Q,-Q,, and after 
introducing two more variables, we have, for example, 


(141 = Q, + Q)) =a projection of (43 = Q)) 
A (te4a = Q1) OV (@e41 = 2143 +2044). 


We complete the proof by induction on the sum of the absolute values of the 
coefficients and on the degree of Q. 


5.5. Second reduction. We now assume that along with the pair (G, R) described 
in 5.3, we have fixed a representation of £ in the form ‘gen Es as in 5.4. In 


this subsection we show that the subgroup H C G generated by R is benign if 
all of the following subgroups H, C G,s =1,...,N, are benign: 


G =| {40, bo, Co; «+3 ds Brns Cm das D1, 2ay ++ 2b, i, } 
m —1 I ere 
aa Li Le =—Lip Li Li Ti, 
H,= II a; bic; [4 biG; [4 bic;*; (f0,---,2m) € Es 
i=l+4+1 i=1 1=0 


To show this, we first set 


m =A l —lin 
al Lox. 252m) = ( I crc (11 oi) [] of*8c?*. (1) 


The set of words {a(a,...,2%m);(@0,---,%m) € Z™*"} is free, since when we 
join two such words (or when we join such a word with the inverse of another 
such word), any cancellation cannot involve the “middle part” of each word, 


which consists of the symbols G;, bj, G;. 
It hence follows that 


N 
(\Fe= 
ssl 


and the subgroup H = (\o_, Hs C G is benign if all of the H, are benign. 
Finally, we have 


N 
fae E () s.}| 


a 


[EE Snes Dpgeay Cpt 28-05 Opes Dery Gra iis Diy Cl gn ain ty Dig Gr } | 
i N 

=: tii cti, ( ) € B= projection of () E 

al bch (m5. +5 Bi projection o ; 
i=0 s=1 


J {ai41, bi41, C41; wi . Gj, bi, G1} 


] 
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so that - 
H= [Fu { ai41, di41)-- .b,a}| [feo «05 Bryer} | 


Therefore, H is benign whenever H is benign. 


5.6. Construction of the group K. We use the criterion 4.1(a) to verify that 
the H, C G are benign subgroups. That is, we explicitly construct a finitely 
presented group K > G and finitely generated subgroups L, C K such that 
Ls 1G for all s = 1,...,N. We construct K as a multiple HNN-extension 
of G. 


(a) The first HNN-extension. We set 
Ko = |Gu {to,. oe strat : Ro, 
where Ro is the set of relations 
{tz * bits = adic; and ti bits = a, b;G;, for i= 0, eee MN; 


the ¢; commute with all the other generators of G }. (2) 


(b) The second HNN-extension. We set 


K= |KoWtign (+1 Sb <1 ke 7A AS mR 


? 


where R is the set of relations 
{tijpbitegn = asbici, ti, Cytagn = tee;; 
the t;;, commute with the t, and with the other generators of G}. 


(3) 


Unlike what we saw in 5.6(a), here it is not completely obvious that K 
is an HNN-extension of Ko. To check this it suffices to show that the map 
bijk (1, j,k fixed i # k,j #k) from the set {generators of G} U {t,} to itself 
that takes 

b; r= a;dic;, Cj tee;, tr lad tk, 


and leaves the other generators of G fixed, extends to an automorphism of the 
subgroup |G U {t,}| C Ko. We have 


ICU = |Get tee H bet, Cp eps, 


where the --- stands for relations that do not involve b; and c;, and so are 
taken to themselves under @;;,;. On the other hand, the two relations that are 
written out are taken to relations that follow from the defining relations in Ko: 
the first goes to 

ty, aadicith = ajoici, 


and the second goes to 
tn tetite = tcp. 


It remains to use the stipulation that i #4 k andj # k. 
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It is clear from the definition of K that K is finitely presented. It follows 
from the properties of HNN-extensions that G C K. 


5.7. Construction of the subgroups L, C K. The form of L, will depend on 
the equation defining the set E, (see 5.4). We define a large number of groups, 
which will include all the L,: 


i= sec ea 
Li; = |{a(0- ean ie 
LE, = |{a(0- ae r(r Ling k) 
Lin = {a(0- tijk, tyik,t r(r F-i,j, k) 


and analogously, in the notation of 5.5, 


= 100i, ayn) = 
= Holton site) eee | 
a = | {altos tm te =O 
Hig = |{a(z0,--.,2m), tk = 2; che 


The L, are clearly finitely generated. It remains to perform one final series of 
verifications: 


5.8. H, =GNn [oA = =GN L;j, and so on. 
First of all, it follows from (1), (2), and (3) that 


t; ‘a(x, ++) Dm)ty = a(Zo,---, 2-1, 04 +1, Gi41,---, Lm), (4) 
bpd Bip st.i ins Magi = a(Yyo,---,Ym); (5) 


where y; = 1; + 1, yx = te +2;, and ys = x5 for s £ i,k. (To verify (5) recall 
that since k > 1+ 1, it follows that t, commutes with the middle part of the 
word a(29,-.-,2@m), which consists of Gj, b;,é,7 < l.) 

It hence follows that 
Le = |B, Uter- i}, 


a 


Ly = |B Udit tle + 


+ 
Lin = 


Lin = =e U {tije, byjiks br Ir # a ae 


zat és 
Ege LS bikes ty tate eis, 


In fact, the inclusions C are obvious. Next, if we begin with a(ao,...,2m) and 
conjugate by t,., it follows by (4) that we can vary the rth coordinate arbitrarily. 
This immediately gives the inclusion Lf > H;, and hence the first required 
equality. The second equality is obtained analogously. 


280 VIII Recursive Groups 


The third equality: conjugating by t;t, increases the ith and kth coordinates 
by 1, and conjugating by t;t, increases the jth and kth coordinates by 1, so 
that we can obtain any vector with x, = x; + x; starting from a vector with 
zeros in these places. 

The fourth equality: conjugating by t;j;, increases 7; by 1 and increases x, 
by x;, and conjugating by t;;, increases x; by 1 and x, by a;. Hence, we can 
obtain any vector with 7, = x; - x; starting from the zero vector. 

This new characterization of the groups L, shows that LD, 9 G > H, for 
all s. It remains to prove the converse. 

To do this, we note that using (4) and (5), we can represent any element in 
L, in the form Th, where T € |{t;, tijx}| (here the set of admissible indices i and 
ijk depends on s) and h € H,. This follows by the same argument as above. 
But by Proposition 2.7(a), all the {t;,ti;x} generate a free subgroup that has a 
trivial intersection with G (see the proof of 2.7(a)). Consequently, if Th € G, it 
follows that T = 1 and h € H,, which completes the proof. 


6 End of the Proof 


6.1. In this section we finish the verification of Proposition 4.3, and hence the 
proof of Higman’s theorem. 

Let G = |{a, b}|, and let H C G be an enumerable subgroup. We shall show 
that H is benign. The first step is to reduce the problem to proving that a 
certain special subgroup 


y 7. e 
H CG &xZ, 
1 


which does not depend on H, is benign. To define H _ we first introduce the 
following recursive enumeration y : Zt — G (which covers each g € G infinitely 
many times): 


co 
ae ieee ih per -) = | | isi Mait1 HM4i+2—M4i43 
i=0 


We then set 


G = |{a,b,t, v, c,d, e}|; 
7: S({a,b,a—',6-7}) > G: II q?* pM 2i+1 Ls t] [(o*av')™* (v— toy?) 2+, 
i>0 i>0 
H’ = |{r(g)e"de"|g € S({a,b,a-1,b-1}), ne Zt,g =n} CQ, 
The formula for 7 defines 7 on words that are not necessarily reduced, and 


reducing a word can change its image under 7. Also note that a generator 
T(g)c"de" of His uniquely determined by n. 
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6.2. Lemma. If H CG isa benign subgroup, then any enumerable subgroup 
HCG is benign. 


PROOF. 
(a) We set 

H = | {r(h)e"de” image of hE H,ne€ Zt h= +(n)}| CH. 
Then 


H =H n7 | {a,b, t,v, c"de"|n E ~~ '(H)} |. 


In fact, the inclusion C is obvious. The converse follows because the set of images 
of the elements c”de”,n > 1, in the quotient of Cc by the kernel generated by 
a,b,t, and v, is free. Hence, in any reduced word in the generators T(g)c"de”, 
the sequence of n's can be uniquely recovered from the word, and if all the n’'s 
lie in y~'(H), it follows that the word lies in H. 

Thus, H "is the intersection of H’ with the subgroup generated by a bounded 
enumerable set of generators (since y~!(H) is enumerable whenever H is). Con- 
sequently, H” is benign if H’ is benign. 

(b) We set 
H =(|{r(p)|he HN CG. 


It is easy to see that 
| U {c,d,e}| =|H’ U {c,d,e}]. 
Hence, 
H =|H' U|fe,d,e}|| 0 |{a,b,v,¢}]. 
By Lemma 4.4, H is benign if H” is benign. 


(c) Finally, we consider the homomorphism ¢ : G’ — G that takes a to a,b 
to b, and t,v,c,d, and e to 1. Obviously, ¢(H) = H. By Lemma 4.5, H is 
benign if H is benign. 


6.3. We now prove that the subgroup H’ C G' is benign. To do this, we construct 
a commutative diagram of group embeddings 


C= KH cK 
T i) T 
HH => OL > OL 


with the following properties: 


(a) K is defined by a finite set of generators and a bounded enumerable set 
of relations; L is generated by a bounded enumerable set of words in the 
generators of K. 

(b) L'SL is an isomorphism. 

(c) H=GoLl ink. 
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It will then follow that H’ is benign. In fact, let K = F/R, where F is the 
free group generated by a finite system of generators of K, Ro is a bounded 
enumerable set of relations between these generators, and R is the normal 
subgroup generated by these relations. It follows from Proposition 5.2 that 
Ro generates a benign subgroup R in F, and then Lemma 4.2 implies that 
K = F/R can be embedded in an f.p. group M. When we embed K in M, 
the bounded enumerable set of generators of L remains a bounded enumerable 
set in M (relative to the generators of M), and hence L Cc M is benign by 
the corollary to Proposition 5.2. Therefore, by (b) and (c) we have that the 
subgroup H '=G Nn Lis benign as a subgroup of M whenever G and L are 
benign. Hence, there is an embedding of (M, H ) in (M,#H) such that M is 
finitely presented, H is finitely generated, and H = HM. This embedding 
induces an embedding of the pair (G , H’) in (M,#) with the same properties. 
Consequently, H’ is also benign in G. 

It remains to construct the diagram of embeddings with properties (a), (b), 
and (c). 


6.4. The group K ". This will be a multiple HNN-extension of Gc. which, as in 
Proposition 2.7, we define using four countable sequences of nontrivial isomor- 
phisms of the subgroup |{t,c,d,e,v~‘av', v~‘*bv'|i > 0}| Cc G with G’. Since 
the elements listed here freely generate this subgroup, it is sufficient to indi- 
cate where our isomorphisms take these elements. These isomorphisms will be 
induced in K’ by conjugation by four sequences of generators 7;,%ji, yi, and 
¥i,t > O (instead of the t;, 7 € J, in §2). The iiowes table gives the action 
of these generators. We use the notation a; = v~‘av’, b; = v~'bu', pj = the jth 
pene number. The element in the table, say, in the c-row and the Z;-column, 
is 2, cx 


t tai ta; + thi tb, * 
Cc cP4i cP4i+1 cP4i+2 cP4i+3 
d d d d d 

e eP4i eP4it1 eP4it+2 eP4i+3 


a a; ajai,j <i ajaja,,j <i by tajbi,g <i biajb; |g <i 
aj, j2t aj, jet aj, j2t aj, jet 


; a aibjazt,j <i cow ere 
a] 


bj, j2t bj, j2t bj, j2t b;, jet 


We finally set 
a leg U {xi, £i, yi, Yilt > O}: the relations in the table|, 
and we take G — K’ to be the natural embedding. 
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6.5. The group L’. We set 
i. = | {tede, x5, i, ya, Yili S o}| Cc kK, 


and we take L’ > K’ to be the natural embedding. In Section 6.7 we shall verify 
that H is embedded in L (as a subgroup of K , in view of the commutativity 
of the diagram). 


6.6. The groups kK and L. We set 
k= ted U {u1, U2, U3, U4, U1, V2, U3, V4} : R\, 


where the relations R and the embedding K’— K are both defined by the 
conditions 


R = the image of the relations in the table after making the substitutions 
-i,, i 7 ~i,, i i, it Hi at, 
Lr uy 'V1Us, Lib uz "v2us, Yi U3 'V3U3, Yi UZ 'U4Uy; 


kK’ — K is the homomorphism that is the identity on G’ and acts by these 
substitutions on the other generators. 

a 
j 
are free in |{u;,v,}|, so that K can be considered as the free product of K’ and 
l{u;,vj|l <7 < 4}| with the amalgamation given by the above substitutions 
(here we take into account Proposition 2.7(a)). 

Finally, we set 


The homomorphism Kk "_+ K isan embedding. In fact, the elements uz *uju 


L = the image of L’ under the embedding Kr ESE, 


6.7. The diagram has now been constructed. It follows immediately from the 
definitions that it satisfies 6.4(a) and (b). It remains to show that H = GOL’ 
in kK’. 
(a) We set [n] = r(g)c"de” for n € Z* and g = 7(n) in the notation of 6.1. We 
recall that His generated by all the [n] in G , and hence in Kk as well. 

The table of relations in K’ was composed in such a way that the following 
relations would be fulfilled: 
i [njes = [pan], By" [n] zs = [pacgrnl, 

¥; (lye = [Paisel, 9; [nly = [pai+sn. 

For example, we verify the first relation. Let n = Il, 3 - Then, according to the 


definitions, 


y(n) = a Ti i ee Ml ae 


j 
™m4j—™M4z5 mM4j+2—™Ma4;5 
[n] = t] [a 4j oy 4j4+2—M4j+3 on gon 
j 
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so that by the first column of the table in 6.4, 
wy (njxx = tava T] ++) aT] 6) Peder" = [pain]. 
j<t jet 


If we further take into account that [1] = tede € L’, we may conclude from these 
conjugation formulas that [n] € L’ for all n, and that H’ C L’, as promised 
in 6.5. Moreover, |H’ U {x;, %;, yi, Ji|i > O}| = L, since the inclusion C has been 
verified, and the inclusion 5 is obvious. 


(b) We now show that in K” we have 
|Z U{ai, 2, yi, Gili > ONG =H. 


Since K’ is an HNN-extension of ea it suffices to show that we are in the situa- 
tion of Proposition 2.7 (as described in the paragraph preceding the proposition, 
at the end of 2.6), and then to apply 2.7(b). 

We verify these conditions, for example, for the first series of isomorphisms of 
the subgroup of G’ , as described at the beginning of 6.4. This series corresponds 
to conjugating by x; in K ". The conditions take the following form in our case: 


az? es nN |{t,¢, d, e; Os, Delt S 0} |x: 
= a (te; fted,era;,b3 [3 > O}|ai; 
i.e., if we use the definition of H’ and the table, 


ay H 2; =HN | {t,c*, d, e*; a,b; [7 > 0}. 


Since x; *[n]x; = [pain], the inclusion C is obvious. Conversely, suppose 
we are given an element in H that is written as a reduced word in the 
[n]: Iljso[nj]©,e; = +1. We consider the corresponding reduced word g in 


G’. We show that if all the powers of c and d that occur in g are divisible by 
pai, then all the n; with nonzero €; in the above product are divisible by pa;, 
i.e., [nj] € 27 1H ai. 

In fact, let G@ = the image of g in {c,d,e}| under the homomorphism that 
takes t,a;, and b; to 1. Since [nm] = cde”, it follows that all the [7] are free, and 
that g uniquely determines the sequence {¢;n;}. It is not hard to see that the 
formulas that express €;n,; in terms of the powers of c and e that occur in the 
reduced word g are linear with integer coefficients (more precisely, they are a 
disjunction of linear formulas accompanied by inequality conditions). Therefore, 
if all these powers are divisible by pa;, then so is nj. 

This completes the proof. 


IX 


Constructive Universe and Computation 


1 Introduction: A Categorical View of Computation 


1.1. Words and integers: two constructive worlds. (a) In Chapters I and 
II we have studied alphabets, words (finite sequences of letters of an alphabet), 
expressions (certain syntactically well formed words such as terms and formulas 
defined in 1.2.3), deductions (finite sequences of formulas defined in II.5.1). 

Let us fix an alphabet of a first-order language and denote by WD F the 
sets of words and formulas respectively. 

Studying deducibility, we have implicitly introduced the set D C F of all 
formulas deducible from, say, a fixed finite set of formulas (axioms). This whole 
set D can be systematically generated and well ordered following a finitely 
describable procedure that, say, first totally orders the alphabet, then totally 
orders elementary steps of deductions etc., prescribing in what order to apply 
them iteratively to the axioms and already deduced formulas. 

In this way we get a bijection Z* — D that is intuitively “computable,” 
together with the inverse bijection. Of course, it is a simple particular case of 
numbering defined in VII.1.2 and studied later on in VII.1. See also II.11 for a 
useful numbering of all formulas in the Smullyan language. 

Having achieved in this way the encoding of certain linguistic constructions 
by arithmetic ones, we have been able in Part III to reduce many problems of 
syntax (and partly semantics) of formal languages to number theory. 

(b) We could have considered Zt as a set of certain words in a finite 
alphabet as well, for example, as the set of binary strings whose first bit is 1. 
Then the whole theory of computability in Chapter V could have been based on 
the notion of Turing machine(s), in place of elementary arithmetic. This view- 
point, leading to the “same” notion of computability and the same supply of 
computable (partially recursive) functions, nevertheless enriches our intuition 
in two essential respects. 

(i) Whereas before Alan Turing, the most common mental image of math- 
ematical reasoning was related to some form of (written) language, Turing 
represented computation as the dynamical evolution of an idealized physical 
system. 
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This dethroning of the linguistic metaphor and its replacement by a metaphor 
grounded in science was a great breakthrough, and a premonition of the age of 
computers. 

Among other developments, Turing’s metaphor broke the ground for 
(at first mental) replacement of the classical computing machine by a quan- 
tum one. The burgeoning theory of quantum computers owes Turing this debt 
of gratitude. 

(ii) Turing’s insight allowed him to undertake a microscopic analysis of the 
intuitive idea of algorithmic computation. In a sense, he found its genetic code. 
The atom of information is one bit, the atomic operators can be chosen to act 
upon one/two bits, and to produce changes in the output of the same restricted 
size. Finally, the sequence of operations at each step is strictly determined 
by the local environment of bounded size, again several bits. Needless to say, 
mathematically “the same” idea can be described in purely linguistic terms. In 
fact, Markov’s normal algorithms do just that. But as we argued above, this 
would constitute a philosophical regression. 

One goal of this chapter is to go in the reverse direction, and to present 
a “macrocosm” of the classical theory of computation. 

The sets Z+, W, F, D are examples of what we will call below constructive 
worlds. Elements of these sets—integers, words, formulas, deducible formulas— 
are constructive structures of the respective kind. Other examples include worlds 
of finite graphs, finite groups, finite rings (up to isomorphism, or “all” in a fixed 
countable universe of sets). 

Each of these worlds is countably infinite, but it is natural to allow also 
finite constructive worlds, such as all binary strings of restricted length. 

In Sections 2 and 3 below we will unite different constructive worlds into a 
constructive universe. It will be a category, with constructive worlds as objects, 
and semicomputable functions as morphisms. Church’s thesis will get a very 
natural reformulation: 


Categorical Church’s Thesis: Any two constructive universes are 
equivalent. 


For more detailed explanations, see Section 2 below, especially 
Comments 2.3. 


1.2. Languages as categories. In Sections 4 and 5 of this chapter, we explain 
that there exist natural constructive worlds that are themselves categories, and 
at the same time languages, that are more convenient for describing morphisms 
between constructive worlds than conventional languages, discussed in Chapters 
1 and 2 of this book. 

Roughly speaking, we can base the theory of recursive functions on a con- 
structive world of descriptions of these functions, whereas the set of functions 
themselves does not form a constructive world. 

This raises a challenge: to find a well-structured world of descriptions faith- 
fully reflecting properties of recursive functions as morphisms. 

Our suggestion elaborated in Section 3 is motivated, on the one hand, by 
progress in general algebra, the theory of (generalized) operads, and on the 
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other hand, by the recent paper by N. Yanofsky (math.LO/0602053), who has 
constructed a specific operad acting on primitive recursive functions. 


We may and will treat operads as functors on appropriate categories of deco- 
rated graphs. Such graphs themselves form constructive worlds, with effectively 
computable finite sets of morphisms. If we admit these categories as new types 
of languages, then a functor defined on such a category becomes the categorical 
version of a model of this language. 


The decorated graphs are idealized versions of flowcharts, which are quite 
popular in the description of various computational processes. Already in the 
1960s, Dana Scott, among others, used an appropriately formalized version 
of them. He united them into a lattice which can be treated as a category 
satisfying strong additional restrictions: see his survey paper “The lattice of 
flow diagrams” in Springer Lecture Notes in Math, vol. 188 (1971). 


This, and the return to the Turing philosophy, complemented by the 
progress of quantum physics, motivates the last subject matter of this chapter: 
Introduction to the theory of quantum computation. 


1.3. Why quantum computation? Information processing (computation) is 
the dynamical evolution of a highly organized physical system produced by 
technology (computer) or nature (brain). The initial state of this system is 
(determined by) its input; its final state is the output. 


Physics describes nature in two complementary modes: classical and quan- 
tum. Up to the 1990s, the basic mathematical models of computing mimiced 
classical automata, although the first suggestions for studying quantum models 
date back at least to 1980. 


Roughly speaking, the motivation to study quantum computing comes from 
several sources: physics and technology, cognitive science, and mathematics. We 
will briefly discuss them in turn. 


(i) Physically, the quantum mode of description is more fundamental than 
the classical one. In the 1970s and 1980s it was remarked that because of the 
superposition principle, or quantum entanglement, it is computationally infea- 
sible to simulate quantum processes on classical computers. Roughly speaking, 
in quantizing a classical system with N states we obtain a quantum system 
whose state space is an (NV — 1)-dimensional complex projective space whose 
volume grows exponentially with N. One can argue that the main preoccupation 
of quantum chemistry is the struggle with the resulting difficulties. Reversing 
this argument, one might expect that quantum computers, if they can be built 
at all, will be considerably more powerful than classical ones. 


Serious preoccupation with quantum computing has also been stimulated 
by rapid progress in the microfabrication techniques of modern computers. It 
has already led us to the level where quantum noise becomes an essential hin- 
drance to the error-free functioning of microchips. It is only logical to start 
exploiting the essential quantum-mechanical behavior of small objects in 
devising computers, instead of neutralizing it. 
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(ii) As another motivation, one can invoke highly speculative, but intriguing, 
conjectures that the “wetware” of brains in fact somehow relies upon quantum 
computations. 

Even without subscribing to this idea wholeheartedly until more experimen- 
tal data are generated, we must be aware of the great quantitative discrepancy 
between the information processing capacity of the brain and our understanding 
of how it might do what it does. 

For example, the IBM Deep Blue chess computer, which in 1996-1997 
played at the level of the world champion Kasparov, could evaluate about 
10° positions per second and search the game tree to a depth of about 10 
moves/countermoves, and up to 40 in exceptional cases. 

Since the characteristic time of neuronal processing is about 107° sec, it is 
very difficult to explain how the classical brain could possibly do the job and 
play chess as successfully. Existing models of neural networks cannot pass this 
test by very wide margin. 

A less spectacular, but no less a resource-consuming task, is speech 
generation and perception, which is routinely done by billions of human brains, 
but still presents a formidable challenge for modern computers using classical 
algorithms. 

Computational complexity of cognitive tasks has several sources: basic vari- 
ables can be fields; a restricted number of small blocks can combine into expo- 
nentially growing trees of alternatives; databases of incompressible information 
have to be stored and searched. 

Two paradigms have been developed to cope with these difficulties: logic- 
like languages and combinatorial algorithms, on the one hand, and statistical 
matching of observed data to an unobserved model, on the other. 

In many cases, the second strategy efficiently supports acceptable perfor- 
mance, but usually cannot achieve the excellence of the Deep Blue level. Both 
paradigms require huge computational resources, and it is not clear how they 
can be organized, unless hardware allows fast and massive parallel computing. 

The idea of “quantum parallelism” (see Section 7 below) is an appealing 
theoretical alternative. However, it is not at all clear that it can be made 
compatible with the available experimental evidence, which depicts the central 
nervous system as a distinctly classical device. 

The following way out might be worth exploring. The implementation of 
efficient quantum algorithms that have been studied so far can be provided by 
one, or several, quantum chips (registers) controlled by a classical computer. 
A very considerable part of the overall computing job, besides controlling quan- 
tum chips, is also assigned to the classical computer. Analyzing a physical device 
of such architecture, we would have direct access to its classical component (elec- 
trical or neuronal network), whereas locating its quantum components might 
constitute a considerable challenge. For example, quantum chips in the brain 
might be represented by macromolecules of the type that were considered in 
some theoretical models for high-temperature superconductivity. 

The difficulties are seemingly increased by the fact that quantum measure- 
ments produce nondeterministic outcomes. Actually, one could try to use this 
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to one’s advantage, because there exist situations in which we can distinguish 
the quantum randomness from the classical case by analyzing the probability 
distributions and using Bell-type inequalities. With hindsight, one recognizes 
in Bell’s setup the first example of the game-like situation in which quantum 
players can behave demonstrably more efficiently than classical ones. 


(ii) Finally, we turn to mathematics. One can argue that nowadays one 
does not even need additional motivation to study quantum automata, given 
the predominant mood prescribing the quantization of “everything that moves.” 
Quantum groups, quantum cohomology, quantum invariants of knots, etc., come 
to mind. This actually seemed to be the primary motivation before 1994 when 
P. Shor devised the first significant quantum algorithm showing that prime 
factorization can be done on quantum computers in polynomial-time, that is, 
considerably faster than by any known classical algorithm. 

Shor’s paper gave a new boost to the subject. Another beautiful result, due 
to L. Grover, is that a quantum search among N objects can be done in c/N 
steps. We briefly present these ideas in Sections 8 and 9. 

Last, but not least, large-scale quantum computers do not exist as yet. The 
quantum algorithms invented and studied up to now will stimulate the search 
for a technological implementation that—if successful—will certainly correct 
our present understanding of quantum computing and quantum complexity. 


2 Expanding Constructive Universe: Generalities 


In this chapter, given a category C and two of its objects X,Y, we will denote 
by C(X,Y) the set of morphisms X — Y in C. 

All our objects will be sets endowed with an additional structure, and 
sets will lie in the initial layers of the Gddel universe £ of constructible sets 
(cf. IV.1). 

Morphisms will be partial maps. 

We choose once and for all some concrete sets, representatives of natural 
numbers and Z* in £, such as 0 = 0,1 = {0}, 2 = {0,1},... and Zt = 
lls ene 

We will first discuss some peculiarities of categories whose morphisms are 
partial maps of sets. 


2.1. Category of sets and partial maps: two approaches. (a) In the first 
approach, partial maps from a set X to a set Y are pairs (f, D(f)) where D(f) 
is a subset of X (possibly, empty), and f : D(f) — Y is an actual map. Denote 
by Par (X,Y) the set of partial maps. The composition is defined exactly as 
was done for a particular case in V.2.3: 


(9, D(9)) ° (f, D(F)) = (9° ff "(D(g)))- 


One easily sees that in this way we get a category, say ParSets. 
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Notice that each set of morphisms Par (X,Y) is pointed, in the sense that 
it has a canonical element “empty map,” say, @x,y. Its composition with any 
other morphism is again the respective empty map. 

(b) This last remark motivates the consideration of another category: that 
of pointed sets PSets. An object of PSets is a pair (X,*x), where *x € X 
(so that X cannot be empty). A morphism (X,*x) — (Y,*y) is an everywhere 
defined map y: X — Y such that y(*x) = *y. The composition is evident. 

Deleting marked points, we get a functor PSets — ParSets: 


Xr X°:= X\{ex}, gry? :=(f,D(f)), 


where for p: X — Y, D(f) is defined as y~1(Y°), and f as the restriction of 
y to D(f). 

This functor turns out to be an equivalence of categories. 

In fact, a quasi-inverse functor can be constructed by formally adding an 
extra marked point *x to each object X in ParSets, and extending each partial 
map (f,D(f)) from X to Y by sending X \ D(f) to *y. 

This formal completion of sets and partial maps by adding “improper,” 
“infinite” elements was reinvented many times, in particular, in topology (one- 
point compactification) and in theoretical computer science. I am grateful to 
A. Beilinson, who drew my attention to the good categorical properties of this 
operation. 

The basic category of sets is endowed by the symmetric monoidal structure: 
Cartesian product. It is naturally extended to ParSets and to PSets. In PSets 
one can put 


(X, *x) X (Y,*y) = (X° x Y°)U {(&x, *y)f, 


so that the equivalence above becomes monoidal equivalence. 

An equivalent (functorially isomorphic) definition uses “reduced product.” 
Namely, (X,*x) x (Y,*y) can be defined as X x Y with the “coordinate cross” 
X x {xy}U {xx} x Y contracted to the base point. 

There is another symmetric monoidal structure on Sets: disjoint union |]. 

It is not canonical and requires choices: what is the disjoint union of a set 
with itself? For a construction, see, e.g., F. Borceux, Handbook of Categorical 
Algebra 2 (Cambridge UP, 1994), Example 6.1.9. 

This structure, as soon as it is chosen, can be directly extended to ParSets 
and PSets. 

Below, we will use both points of view on partial maps interchangeably, as 
equivalent ones. 


2.2. Definition. A subcategory C of ParSets as above is called a construc- 
tive universe if it contains the constructive world Z* of all integers > 1, and 
also finite sets 0, {1},..., {1,...,n},... and satisfies the following conditions 


(a)-(d): 


(a) C(Z*,Z*) is defined as the set of all partially recursive functions. 
(b) Any infinite object of C is isomorphic in C to ZT. 
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(c) If U is finite, C(U,V) consists of all partial maps U — V. If V is finite, 
C(U,V) consists of f such that D(f) and inverse images of all elements of 
V are enumerable. 

(d) C inherits from ParSets two compatible symmetric monoidal structures: 
Cartesian product x and disjoint sum []. 


2.3. Comments. (i) The statement (b) is a version of the Church thesis. 
In V.2.4 we stated Church’s thesis in the context of functions from (Z*)™ to 
(Z+)". 

Here we make it simultaneously broader and vaguer. Imagine that we want 
to speak about algorithmic processing of variable finite objects of a given type 
U into similar objects of possibly different type V. U and V might be words, 
graphs, groups, finite and finitely describable Bourbaki structures, .... We pos- 
tulate that one always can translate such a processing into the calculation of 
values of a recursive function. The main step in the reduction is the choice of 
two “computable numberings,” those of U and V. 

Formally, such an numbering is an isomorphism Z* — U in C. Two such 
different numberings of the same constructive world can differ only by a recur- 
sive permutation of numbers, that is, by an automorphism of Z* in C. We will 
call such numberings equivalent ones. 

In practice, a numbering of a set-theoretically defined constructive world 
U, embedding it into C, is chosen in such a way that some “natural” construc- 
tions on constructive objects of the type U given a priori become obviously 
computable. 

For example, we can renumber U in an eminently theoretically important 
and sophisticated way, ordering U by the growing Kolmogorov complexity of 
its constructive objects. But then the simplest operations would become non- 
computable. Generally, such a Kolmogorov numbering will not be an isomor- 
phism in C: cf. further discussion in Section 10. 

Returning to (b), we see that each infinite constructive world, that is, an 
object of C, is endowed with a well-defined class of enumerable subsets. This 
fact is used in the statement (c). The axiom (c) is justified by the fact that 
partial recursive functions on Zt taking only a finite number of values are 
characterized by the stated properties. 

Similarly, decidable subsets are well defined. 


(ii) Notice that because of (c), two finite constructive worlds are isomorphic 
iff they have the same cardinality, and the automorphism group of any finite U 
consists of all permutations of U. Therefore, the whole category C is equivalent to 
its full subcategory, whose objects are Z* and finite sets, one of each cardinality. 

However, this subcategory is too small to accommodate even our standard 
definition of partial recursive functions in V.2: we have to extend it by Cartesian 
products. For many constructions, it is also convenient to have disjoint sums. 
This is the reason we completed the definition by the requirement (d). It implies 
that canonical projections of Cartesian products and structure embeddings into 
disjoint sums are computable. 
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(iii) In view of the previous remark, any two constructive universes are 
equivalent (even as monoidal categories). Nevertheless, as a matter of principle, 
we always consider C as an open category, and at any moment allow ourselves 
to add to it new constructive worlds. If some infinite V is added to C, it must 
come together with a class of equivalent numberings. 

In this way, we may declare the world of a decidable subset of any object of 
C to be an object of C. 

Here is another example. The world U* of finite sequences of elements of a 
constructive world U (“words in the alphabet U”) is endowed with a canonical 
class of numberings. Hence we may assume that C is closed with respect to 
the construction U +> U*. All natural functions, such as length of the word 
U* — Z*, or the ith letter of the word U* — U, are computable. Moreover, if 
f : U —V isamorphism in C, then the partial function f* sending (u1,..., Un) 
to (f(u1),..-, f(Un)), whenever all f(u;) are defined, is a morphism U* — V*, 
and (go f)* = g*o f*. Hence U ++ U* extends to a covariant endofunctor C — C. 


(iv) Some (or even “all”?) infinite constructive worlds U come together 
with a natural class of bijective numberings u: Zt — U such that any two 
numberings u,v in this class have one of the following properties: 


u-!ov is a primitive recursive permutation; 


or even 
u-lov isa polynomial-time computable permutation (cf. 6.5 below). 

If a version of C includes only objects satisfying the first (resp. the sec- 
ond) condition, one can define a subcategory Cprim (resp. Cpo1) having the 
same objects, but only primitive recursive (resp. polynomial-time computable) 
morphisms. 

The assumption that “all” constructive worlds do in fact satisfy one of the 
two requirements could be called the “primitive recursive,” resp. “polynomial- 
time” Church’s thesis. 


2.4. A natural numbers object. We could have replaced Z* in the above 
discussions by an abstract natural numbers object in an unspecified category 
B. Its definition conforms to a general spirit of categorical reasoning: sets of 
morphisms rather than objects should be bearers of additional structures. 

More precisely, assume that B admits a terminal object 1. A triple (N, z, s) 
in B, consisting of an object NV and two morphisms 


Zz: 1oN, s: NaN, 


is called a natural numbers object if for any other pair of morphisms in 6 of the 
form 
f:1lox,9g: xX HX 


there exists a unique morphism h: NM — X such that 
hoz=f, hos=goh, 


that is, the diagram is commutative. Of course, the leftmost arrow can only 
be idy. 
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{wv 


h h 
i g 


1 —~> xX —~> x 


This is the simplest form of categorical recursion: values of the morphism h 
on the categorical points s°"oz € B(1, NV) are given by g°"of € B(1, X). Thus, 
f is the initial condition (value at n = 0), and g corresponds to one iterative 
step applied to the previous value. 

Clearly, Z* together with 


Z:1R1EZ", s:nen+1 


is a natural numbers object in the category of sets. 

We will return to this philosophy, discussing normal models of computation 
in 6.1 below. 

In Sections 3-5, we however, we stick to the more down-to-earth approach, 
sketched at the beginning of this section. 


3 Expanding Constructive Universe: Morphisms 


3.1. Programming methods. We now turn to the computability properties 
of the sets of morphisms C(U, V). Again, it is a matter of principle that C(U, V) 
itself, and even Cprim, 1s not a constructive world if U is infinite. 

Indeed, otherwise we would have an intuitively computable bijective num- 
bering of all partial recursive (resp. primitive recursive) functions Z* — Zt. 
Using numbers of such functions as their descriptions, we could algorithmically 
distinguish them. But the latter problem is not algorithmically solvable. 

In order to compensate this by a sample of positive statements, let us 
consider the following situation. 

Any diagram in C 

evp: PxU—V 


(evaluation morphism) defines a partial map P — C(U,V), p +> BD, where 
B(u) := evp (p,1). 
3.2. Definition. 


(a) We will say that a constructive world P = P(U,V) together with the evalu- 
ation map evp as above is a programming method. Elements of P are called 
programs. 

(b) A programming method (Q = Q(U,V), evga) is called versal (resp. primitive 
versal) if two conditions are satisfied. 
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First, the map Q — C(U,V) : q+ Z is surjective (resp. its image consists 
of all primitive recursive morphisms). 

Second, for any programming method (P = P(U,V),evp) with the same 
source U and target V (resp. for any (P, evp) producing only primitive recursive 
morphisms) there is at least one compilation morphism in C 


comp: P(U,V) = Q(U,V), 


that is, an everywhere defined, computable map P — Q such that if comp(p) = 
q, then BD = @. 


3.3. Claim. Versal programming methods exist. 


PROOF. For brevity, we will consider only the case of infinite U, V. Then P is 
infinite as well. Since any infinite object is isomorphic to Z*, we will identify 
U,V with Zt, but for convenience we will keep the notation P for the world 
of programs. Thus we may restrict ourselves to considering only evaluation 
morphisms ev: P x Z* — Z*. 

Such a morphism computes all recursive functions Zt — Z* iff it is a versal 
family in the sense of V.5.7. 

Now consider another versal family, that of recursive functions of two vari- 
ables P x Z+ — Zt. Let P’ be its base: 


Ev: P xPxZt—oZ. 


We now affirm that the programming method (Q := P' x P, Ev) is versal. 

In fact, versality of Ev implies that for any ev : P x U — JV, there exists 
p €P such that Ev (p ,p,u) = ev (p,u) for all (p,u) € Px Z*. Therefore, the 
map 

comp: P+ Q: p+ (p,p) 

is a compilation morphism for (P, ev). 
Remark. We can now make precise the statement made at the beginning of 3.1. 
Namely, it means that for any programming method P(U,V), the canonical 
map P(U,V) — C(U,V) cannot be bijective if U is infinite. In fact, if it is 
surjective, then it is essentially the same as a versal family; but the equivalence 
relation on the base of a versal family induced by p + @ is not decidable 
(or even recursively enumerable). 
3.4. Composition of morphisms at the level of programming methods. 
Let U;,U2,U3 be three objects of C, and (Qi;,ev;;) three versal programming 
methods, for C(U;,U;), 17 = 12,13, 23 respectively. 

Then (Q23 X Q12,ev23 0 (idgs, X eVi2)) is a programming method for 
C(U;, Us). It calculates the composition of morphisms U, — U2 — Us. 

Since Q13 is versal for morphisms U; — Us, there exists a compilation 
morphism 


comp : Q23 X Qi2 > Q13 


that reproduces composition of morphisms on the level of programs. 
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Notice that even if we restrict ourselves to the full subcategory with one 
object U; = Uz = U3 = Zt and fix a choice of Q and comp, the composition of 
morphisms on the level of programs generally will not be associative. Moreover, 
a program calculating identical morphisms generally will not be the identity for 
program composition. 

This motivates the following definition. 


3.5. Definition. A category of algorithms over a constructive universe C is a 
pair consisting of a category A and a functor J: A — C with the following 
properties: 


(a) A is enriched over C. 

This means in particular that morphism sets in A are objects of C, and the 
composition maps A(U,V) x A(V,W) — A(U,W), as well as identities, are 
morphisms in C fitting into standard commutative diagrams. 

(b) J identifies Ob.A with a subset of ObC. We will make no distinction 
between U and J(U). 

(c) For any objects U,V of A, A(U,V) is a programming method. In par- 
ticular, it comes together with the evalution morphism in C 


eVvUV : AU, V) xUV. 


This morphism must satisfy the following condition: for all f € A(U,V) 
andueU, 


I(f)(u) = evu,v (f,u)- 


3.6. Comments. (i) The notion of a category of algorithms formalized in 
the previous definition was introduced (in a somewhat less explicit form) by 
N. Yanofsky in math.LO/0602053. The same paper contains a construction of 
such a category in which J defines surjections J: A(U,V) — Cprim(U, V). 


(ii) Since A is enriched over C, we actually work here in a 2-categorical 
context: morphisms in A, being objects of C, are connected by 2-morphisms. In 
particular, the associativity of composition is not a literal family of identities 
ho(fog)=(ho f) og but rather a family of canonical isomorphisms 


Gnfg:i ho(fog)— (ho flog 


interconnected by the standard coherence conditions. 
A similar remark applies to left and right identities. 


(iii) Given a category A as above, we will call programs p € A(U,V) algo- 
rithms. In fact, N. Yanofsky reserves this name for a category satisfying stronger 
coherence properties, which is in a certain sense canonical. A part of his con- 
structions will be described in Section 5. 
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4 Operads and PROPs 


In this section, we will consider a somewhat reduced version Cg of the construc- 
tive universe with two monoidal structures (C, x,]]) defined in 3.2. First, we 
will exclude all finite objects of cardinality > 2. 


4.1. Definition. (Co, x) is a full monoidal subcategory of (C, x) such that each 
object of Co is either infinite or has cardinality 1. 


4.2. Reduction. From Definition 3.2 it follows that Co is equivalent to its 
full subcategory consisting of Cartesian powers (Zt), m > 0, and partial 
recursive functions. Moreover, (Z*)™ x (Zt)" can be canonically identified 
with (Z*+)™*", so that the category will become strict. The zeroth Cartesian 
power is a one-point set {*}, the unit for the monoidal structure. 

The family of morphisms C((Z*)™,(Z*)™), and in fact similar families of 
morphisms in any symmetric or enriched symmetric monoidal category, are nat- 
urally endowed with structures, known under the names collections and PROPs. 


4.3. Definition. (a) A collection P in a category B is a family of objects 
P(m,n), m,n > 0 in B, together with group homomorphisms 


Sm xX SC? — Autg P(m,n). 


We interpret such a homomorphism as a pair consisting of a left action of 
the symmetric group S,, and a right action of 8, on P(m,n) that commutes 
with it. 


(b) A morphism of collections f : P — Q is a family of morphisms finn : 
P(m,n) > O(m,n) commuting with the action of symmetric groups. 


4.4. Endomorphism collections. Let (€, x) be a symmetric monoidal cate- 
gory with unit object e. For U € ObE, put 


Coll End (U)(m,n) := E(U",U™). 


The action of S,, (resp. S°?) is induced by permutations of factors in the 
Cartesian powers U™ (resp. U"). The zeroth power is interpreted as e. 
Whenever € is an enriched category, one must first make sense of permu- 
tation groups acting on objects in the category of morphisms. This does not 
present any additional difficulties. 
A PROP is a collection, endowed with additional composition laws mutually 
compatible with the actions of the symmetric groups. 


4.5. Vertical and horizontal products in endomorphism collections. 
Endomorphism collections are naturally endowed with two additional struc- 
tures: 


(a) Vertical products 


E(U™,U") x E(U",U') = €(U™,U'): (fg) gof. 
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(b) Horizontal products 
eu, Us) Seto SE Eu") > EL ie Tes | . 
The latter are induced by the monoidal structure in &: 


(fis... fs) fr xo fs. 


If € is enriched, the category of morphisms must be strict monoidal, and its 
monoidal structure must be compatible with that of € in the standard way, so 
that the horizontal products still make sense. 

In a constructive universe, a vertical product is the composition/substitution 
of partial maps. 

These structures in endomorphism collections satisfy a number of cumber- 
some but straightforward universal conditions, which we only list here: 


(i) Associativity of vertical products; units for them in €(m,m). 
(ii) Compatibility of vertical products with actions of symmetry groups. 
(iii) Associativity of horizontal products. 
(iv) Compatibility of horizontal and vertical products. 
(v) Compatibility of horizontal products with actions of symmetric groups. 


Assuming that these conditions have been written formally, we can now give a 
general definition: 


4.6. (Tentative) definition. 


(a) A PROP in a category B is a collection in 6, endowed with horizontal and 

vertical compositions as in 5.3, enjoying the universal properties 4.5 (i)—(v). 
(b) An operad in a category B is a collection whose only nontrivial terms are 

P(1,n), endowed with a right action of S,, and vertical products that satisfy 

4.5 (i), (ii). 

The collection Coll End(U) as above is denoted by Prop End(U) when it 
is endowed with its natural structures 

Any PROP produces a collection if compositions are forgotten; this functor 
under quite general conditions can be proved to have a left adjoint functor: free 
PROP generated by a collection. This gives a rise to the notion of subcollection 
of generators of a PROP similar to, say, generators of a monoid. 

We are most interested in Prop Ende(Z*) as an algebraic approximation 
to the constructive universe C. We might also try to restrict ourselves to its 
primitive recursive version. However, it turns out that the preceding framework, 
even we if take the trouble to formalize it by supplying all commutative diagrams 
implicit in Definition 4.6, is too narrow for our goals. 


4.7. Example: the collection of basic recursive functions. Working now in 
C, we can define the collection of basic recursive functions R C Prop Ende (Z*), 
using the notation of V.2.2 . The respective terms of the collection are 


R(1, 0) = (1}, 
R(1,1) := {suc, 1%, id}, 
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R(1,n) := {1™, pr?} for n>2. 


The remaining components of R will be empty. 

The action of the symmetric groups is induced by that in Prop Ende (Z*). 
In fact, it is not identical only on R(1,7): the pr?’ are permuted as the i’s are, 
@E{1,...,n}. 

We would like to have an algebraic structure reflecting our knowledge that 
basic functions “generate” all primitive/partial recursive functions. But to do 
this, we lack some necessary operators iteratively acting on basic functions. In 
fact, composition V.2.3 (a) is accommodated in the general definition of PROP, 
and juxtaposition can be dealt with if we add the diagonal A: Zt — Zt x Z*, 
but the recursion and p-operator are very specific for C, and we lack general 
means to deal with them. 

In the next section, we will introduce the constructive world of graphs, 
and its extensions, worlds of decorated graphs. We will turn these worlds into 
categories, and will explain how they provide very convenient linguistic tools for 
speaking about PROPs and similar structures, in particular, about the PROP 
of recursive functions. 

Later we will see that similar constructions naturally arise in the computa- 
tion theory as well. 

The relevant graphs will be (geometric versions of) Boolean circuits, finite 
automata for processing binary input data. 


5 The World of Graphs as a Topological Language 


5.1. Introduction. Generally, each constructive world comes with its own sup- 
ply of “natural operations.” Although any two constructive worlds of the same 
cardinality are connected by a computable isomorphism, this does not mean 
that, say, a natural numbering of formulas in a language of arithmetic pro- 
vides convenient tools for their syntactic analysis or for thinking about their 
interpretations in a model. 

In particular, when we replace nonconstructive sets of morphisms, say 
c(U™,U"), by a constructive world of respective programming methods, we 
have to deal with two different sets of natural operations in this constructive 
world: 


(a) Evaluations (see 3.1), where a programming method being fixed, the main 
operation consists in calculating values of, say, a partial recursive function. 

(b) Operations, producing new programming methods from old ones, such as 
composition, compilation, recursion. 


In principle, the latter are not qualitatively different from evaluations, since 
we can think about programming methods whose inputs and outputs are pro- 
gramming methods as well. 

What is needed for efficient constructivization of programming methods is 
a good encoding scheme, simultaneously intuitive and accommodating natural 
operations. 
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We already mentioned two mental worlds in which various encoding schemes 
can crystallize: 


(i) World of expressions in a language (to which we appealed in previous 
chapters). 

(ii) World based on scientific/engineering imagery, such as Turing’s 
machines, or Boolean circuits (cf. below). 


In this section, we will describe the third, topological one: 


(iii) World of (decorated) graphs: geometric images of information flows and 
hubs where the flows merge, get processed, and diverge again to flow fur- 
ther. 


Moreover, we will formalize and endow this world by the structure of a 
constructive category. 

Looking at graphs as a replacement of formulas in a language, we define 
models/interpretations as functors on various categories of decorated graphs. 


5.2. Graphs. One usually imagines a graph as a picture, or better, a topolog- 
ical space, consisting of several points (vertices) pairwise connected by several 
(curvi)linear segments (edges). 

We will consider each edge as consisting of two “halves” (flags), issuing from 
their respective vertices and joined at the edge’s midpoint. Moreover, we will 
allow certain flags not to be paired into edges; they will be called tails. 

A combinatorial graph is a collection of two abstract sets and two incidence 
relations. Here is a formal definition. 


5.3. Definition. A combinatorial graph, or simply graph, 7 is a quadruple 
(F,,V;,0r,j7), where F,,V; are finite sets (elements of a constructive world), 
and (0,,j,) are maps. Elements of F, are called flags of 7, elements of V; are 
called vertices of 7; vertices and flags are disjoint. The map 0, : F, — V, 
associates to each flag a vertex, its boundary. The map 7, : F,; — F; is an 
involution: j? = id. 


(a) Marginal cases. If V; is empty, fF; must be empty as well. This defines an 
empty graph. In contrast, F might be empty whereas V; is not. 

(b) Corollas, tails, edges. One-vertex graphs with identical j, are called corol- 
las. Let v be a vertex of 7, F,(v) := O74(v). Then 7 := (F;(v), {v}, 
evident O, identical j) is a corolla, which is called by the corolla of v in rT. 


Flags fixed by j, form the set of tails of 7 denoted by T;. 
Two-element orbits of 7, form the set E, of edges of 7. Elements of such an 
orbit are called halves of the respective edge. 


5.4. Geometric realization of a graph. First, let 7 be a corolla. If its set of 
flags is empty, its geometric realization |r| is, by definition, a point. Otherwise 
construct a disjoint union of segments [0,1/2] bijectively indexed by flags, and 
identify in it all points 0. This is |7|. The image of all 0’s thus becomes the 
geometric realization of the unique vertex of T. 
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Generally, to construct |7| take a disjoint union of geometric realizations of 
corollas of all vertices and identify points 1/2 of any two flags forming an orbit 
of j,, that is, an edge. 

A graph 7 is called connected (resp. simply connected, resp. tree etc) iff its 
geometric realization is such. In the same vein, we can speak about connected 
components of a graph, etc. Vertices v with empty F;(v) are considered con- 
nected components. 


5.5. Decorations. We will not try to aximatize a general notion of decoration, 
and only list some classes of them most useful for describing flowcharts. 


(a) Orientations. Any map Fy, — {in, out} such that halves of any edge are 
oriented by different labels is called an orientation of o. On the geometric 
realization, a flag marked by in (resp. out) is oriented toward (resp. away 
from) its vertex. 


Tails of o oriented in (resp. out) are called (global) inputs (resp. (global ) 
outputs) of o. Similarly, F,(v) is partitioned into inputs and outputs of the 
vertex v. 

Consider an orientation of o. Its edge is called an oriented loop if both its 
halves belong to the same vertex. Otherwise, an oriented edge starts at a source 
vertex and ends at a different target vertex. 

More generally, a sequence of distinct edges e€1,...,€n, is called a simple 
path of length n if e; and e;,; have a common vertex and the n — 1 vertices 
obtained in this way are distinct. If, moreover, e; and e,, also have a common 
vertex distinct from the mentioned ones, this path is a wheel of length n. A loop 
is a wheel of length one. Edges in a wheel are endowed only with a cyclic order 
up to inversion. 

Clearly, all edges in a path (resp. a wheel) can be oriented so that the source 
of e;+41 is the target of e;. 

If the graph is already oriented, the induced orientation on any path (resp. 
wheel) either has this property or does not. Respectively, the path is called 
oriented or not. 

(b) Directed graphs. An oriented graph a is called directed if it satisfies the 
following condition: 

On each connected component of the geometric realization, one can define a 

continuous real-valued function (“height”) in such a way that moving in the 

direction of orientation along each flag decreases the value of this function. 

In particular, a directed graph has no oriented wheels. 


5 The World of Graphs as a Topological Language 301 


Notice that, somewhat counterintuitively, a directed graph is not necessarily 
oriented “from its inputs to its outputs” as is usually shown on illustrating 
pictures. In effect, take a corolla with only in flags and another corolla with only 
out flags, and graft one input to one output. The resulting graph is directed 
(check this) although its only edge is oriented from global outputs to global 
inputs. 

This is one reason why it is sometimes sensible to consider only those 
directed graphs that have at least one input and atleast one output at each 
vertex. 


(c) Labeling of vertices. A labeling of vertices by a set S is a map V; — S. As 
above, S may consist, e.g., of names of basic functions. 

(d) Coloring of flags. A coloring of flags by a set I is a map F, — I. In the 
context of flowcharts, we can imagine, for example, that we start with a 
family of objects {U;|i € I}, and want to describe morphisms between 
products of such objects. Then the color 7 of an input/output will specify 
that this input/output must be taken from Uj. In this case halves of an edge 
must have the same color. 


Even if we have only one object in this family, we may want to totally order 
the sets of inputs/outputs of each vertex. This is what is needed to present the 
vertex as encoding a map U™ — U™ rather than a map Utinputs} _, [ytoutputs} 
and make a direct connection with the world of descriptions, using traditional 
notation for functions, such as (f1(u1,...,Um),---;fn(u1,.--,Um)). Such a total 
ordering of, say, inputs is equivalent to their coloring by {1,...,m}. This is 
the case when an ordering is not intrinsically needed, but used only in the 
comparison of flowcharts with descriptions. 

We will now explain that after introducing morphisms of graphs, we will 
be able to efficiently use them to encode operations and identities between 
operations. 


5.6. Isomorphisms of graphs. The notion of isomorphism is (almost) straight- 
forward: an isomorphism h: tT — 0 consists of two bijections 


hy: V;—7>V,, h®:F,>F, 


commuting with boundary and involution maps. Composition is composition of 
maps. 

Notice, however, one peculiarity: hy is covariant, whereas h” is contravari- 
ant. This choice can be explained using the intuition behind flowcharts: a change 
of arguments produces the lift of functions in the reverse direction. 


5.7. Groupoid of corollas Cor. Consider first the category (groupoid) of 
oriented corollas with isomorphisms preserving orientation. 

It is equivalent to the groupoid whose objects are ordered pairs of sets 
{{1,...,m}, {1,...,n}} and morphisms are permutations acting on two sets 
separately. 
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5.8. Claim. A collection P in a category B (cf. Definition 4.3) is “the same as” 
a B-valued functor P on the groupoid of oriented corollas. 

In fact, P(n,m) can be identified with the value of P on a corolla with 
inputs {1,...,m} and outputs {1,...,n}. The action of S,, x S°? is determined 
by values of P on the automorphisms of this corolla. 


5.9. Disjoint sums of corollas and mergers. A graph Tt = (F;,V;,0,, jr) 
is called a disjoint sum of corollas if its set of edges is empty. Equivalently, all 
flags are tails. 

Let 7,0 be disjoint sums of corollas. Define a merger morphism T — 0 as a 
pair of maps, compatible with boundaries, 


hy : Vz; > Vo, hes Fo PR 


such that hy is a surjection and h* is a bijection. Composition of mergers is 
obviously a merger. If o is a corolla, h is called a total merger. 

We will assume that a monoidal structure disjoint union [| on C is chosen 
and fixed; it can be naturally extended to graphs and then restricted to the 
category of disjoint sums of corollas. 

Denote by DCor the category of disjoint sums of corollas with compositions 
of mergers and automorphisms as morphisms. 


5.10. Claim. A collection P in a symmetric monoidal category (6, x), endowed 
with horizontal products 5.3.(b) satisfying the associativity conditions 5.3(iii) 
and compatibility with action of symmetric groups 5.3(v), is “the same as” a 
symmetric monoidal functor 


P : (DCor, []) = (8, x). 


In fact, horizontal products as given in 4.5 are simply values of P on obvious 
total mergers. 

A stylistic remark: the quotation marks around the expression “the same 
as” are supposed to alert the reader to the fact that Claim 5.10 must in fact be 
understood as the first definition of a collection with horizontal compositions. 
Having avoided a precise statement of the compatibility conditions 4.5 (iii) and 
4.5 (v), we now simply hide them in the standard definition of a (symmetric 
monoidal) functor and implicit combinatorics of mergers and isomorphisms. 

We still do not have enough morphisms to give a definition of PROPs as 
functors. We will now supply them, by introducing contraction morphisms. 


5.11. Definition. (a) A contraction morphism h: T > ¢ is a pair of maps 
hy : Vz; > Vo, ht: Fie — F, 


such that h” is an injection bijective on tails, hy is a surjection, and any two 
vertices in a fiber hy (v) can be connected by a path consisting of edges whose 
halves lie in F, \ h’'(F;,). 


(b) If a, 7 are oriented, h” must be compatible with orientation. 
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5.12. Application to PROPs. In geometric realizations, a contraction 
morphism induces a map that boils down to the geometric contraction of a 
subgraph of T consisting of edges in F, \ h* (F;). 


Let us show how combined grafting and contraction of flowcharts allows us 
to interpret functorially the composition of morphisms in Prop End (U), that 
is, vertical products in 4.5 (a). 


Namely, first we interpret €(U™,U”) as the value of a functor P : DCor > 
Sets on sums of oriented corollas endowed with automorphisms and mergers. 
Now extend the category DCor to include morphisms that can be obtained as 
graftings followed by contractions (and, of course, products of such morphisms). 
Our functor P has a natural extension to this larger category. In particular, if 
we take the union of two oriented corollas, graft bijectively outputs of the first 
one to inputs of the second one, and then contract all edges obtained in this 
way, we will get a morphism in the extended DCor, and the value of P on it 
will be the composition map 4.5 (a). 


We will now present another category of decorated graphs that can be used 
to generate descriptions of (primitive) recursive functions. This is a modified 
version of a part of Yanofsky’s preprint math. CT /0609748. 


5.13. The constructive world of decorated graphs Prim. Elements of Prim 
are disjoint unions of trees T in which each vertex is the boundary of at least 
two flags. Moreover, 7 must be endowed with an admissible decoration. The 
latter consists of the following data. They can be chosen independently on each 
connected component so that in the following discussion we speak about trees 
if we have not explicitly mentioned the general case. 


(a) A marked tail, which is called the root, or the (global) output of 7. Its 
vertex is called the root vertex. The remaining tails are called (global) inputs of rT. 
Global inputs form a set F*” C F,, and we consider the global output as an 
one-element subset Fe“! C F;. 


A choice of root determines (and is equivalent to) the choice of a specific 
orientation: a map F, — {in,out}. Namely, in each shortest path (sequence 
of flags) from a global input to the root, assign out to the flag that leaves its 
vertex, and in to the flag that enters it. This defines the partition of all flags 
into two subsets: (local) inputs and outputs. 


We will say that 7 with such a decoration is an oriented tree. We repeat 
that by definition, each oriented tree must have exactly one global output and 
at least one global input. 


(b) All corollas of an oriented tree are also oriented trees. The next part of 
a decoration is a choice of total order on the set of inputs of each corolla of T, 
and, if 7 is not connected, a choice of total order on the set of its connected 
components. 


(c) A map arity/coarity: F, +N: f + (a(f), c(f)). If two flags are halves 
of an edge, they must be assigned the same arity/coarity. 


304 IX Constructive Universe and Computation 


(d) A map op: V,; — {c,b,r}. The value op (v) assigned to a vertex is called 
the respective operator: c,b,r stand respectively for composition, bracketing, 
recursion. 

(ec) A map in: Fi”? > {basic recursive functions} such that for each 
i € F’”, in(i) is a basic function of arity a(i) and coarity c(i). 


All these data must be compatible. A part of the compatibility conditions 
was already included in the description. We will now formally introduce the 
remaining set, and simultaneously explain an interpretation of graphs in Prim 
(without decoration 5.11 (e)) as operations acting on families of input functions. 


5.14. Objects of Prim as flowcharts. Given an oriented tree 7 with a deco- 
ration as above, we interpret the whole of 7 as a symbol of an operation Op(r) 
that can be performed over families of functions, indexed by global inputs of rT. 

More precisely, let f = {f;|i € F2”} be a family of functions (or even partial 
functions) such that f;: (Z*+)*® — (Z+)°®, Then 


Op(t)(f) =9: (Z*)* > (Z*)°, 


where (a,c) is the arity/coarity of the root. 

The prescription for getting g, given f, runs as follows. 

One-vertez case. Let 7 be a corolla whose vertex is decorated by c,b, 
or r. Then g is obtained by applying to the family {f;}, i € F’”, the respective 
elementary operation: composition, bracketing, or recursion. This requires the 
following compatibilities, which vary depending on the label of the vertex. 


(a) Composition. Let (a1,c1),...,(@r,¢r) be the family of arities/coarities 
of inputs ordered as the respective flags. They must then be constrained by the 
condition cy = ag,...,Cp—1 = @,, and the arity/coarity of the output must be 
(a1, Cr). 


For a general 7, these compatibility conditions must be satisfied for all 
corollas T, of all vertices decorated by c. 

In the flowchart interpretation, such a corolla transforms an input family 
(fi,---s tr), fi: (Z*)% — (Z*)%, into the composition f, 0 fp—10+++ fi. 

Notice an essential difference in treating compositions in the context of 
PROPs, resp. Prim: for PROPs, we graft and contract, whereas for Prim, 
we endow a vertex with the task of composing. 

This is because the corollas for PROPs are flowcharts accepting arguments 
from, say, (Z*)™ and producing a vector in (Z*)", whereas decorated trees in 
Prim accept and produce arguments that are themselves vectors of functions, 
and we want to compose these functions rather than programs producing them. 


(b) Bracket. With the same notation as in (a), the compatibility condition 
reads Ge := a, =-+:: = ay, and the arity/coarity of the output must be (ae, ¢c1 + 
cast Gye 


For a general 7, these compatibility conditions must be satisfied for all 
corollas 7, of all vertices decorated by b and respective orderings. 
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In the flowchart interpretation, such a corolla transforms an input family 
(fi,--->fr), fi: (Z*)% — (Z*)%, into the map 


(it tee fr) : Ca as = (Zt)ateter, 


It was called juxtaposition in V.2.3 (b). 


(c) Recursion. If a vertex is decorated by r, it must have exactly two local 
inputs. If the arity/coarity of the first one (in their structure order) is (a,c), 
for the second one it must be (a + c,c), and for the local output it must be 
(a+1,c). This is our compatibility condition. 

In the flowchart interpretation, such a vertex takes as input two arbitrary 
maps f1 : (Zt)* — (Z*)°, fo: (Zt)**° = (Z*)* and produces the output 


g(a yr aay 
defined recursively as 


g(x, 1) = fi(z), 
g(x, k+ 1) = falz, file, k)) 


for each x € (ZT)*,kE Zr. 

This form of recursion is more restrictive than the one that is often used: 
it does not allow fz to depend explicitly on the recursion parameter k. How- 
ever, R. M. Robinson proved in 1947 that it suffices to use it in order to get 
all primitive recursive functions if an extension of the list of basic functions is 
allowed. Afterward, M. D. Gladstone showed that such an extension is unnec- 
essary (Jour. Symb. Logic, 32:4 (1967), 505-508). I am grateful to N. Yanofsky 
for these references. 

General case. First consider a connected graph 7. Assume that it has > 2 
vertices. We define the operation Op (rT) by induction on the number of vertices. 

Namely, for a vertex v that is the boundary of a global input, consider the 
subfamily fy := {fi |0-(¢) = v}. Denoting by 7, the corolla of v (an in-corolla), 
calculate g, := Op (Ty)(fv) as specified above. 

One can check that this prescription produces the result independent of 
arbitrary choices. 

Now consider the maximal decorated subtree 7° of 7 whose flags and 
vertices do not belong to this in-corolla. Its global inputs consist of all global 
inputs of 7 not adjacent to v, and j;(r), where r is the root of our corolla. 
Decoration of 7° is the restriction of that of 7; global inputs of T retain also 
their input functions f;. Decorate the input j;(1r) by gy and put 


Op (r)({Fih) = Op (T°) {fis v | OH) F v}). 


The right-hand side is defined due to the inductive assumption. 


Finally, if 7 is the disjoint union of connected components |] Ta, we put 


acA 


Op([] Ta) = X ae AOp (Ta) 


acA 
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in the sense that Op (7) acts on the family, naturally indexed by A, of (families 
of) global inputs of connected components, and produces the family of outputs, 
as well indexed naturally by A. 

As we implied in the previous discussion, we can apply Op(r) to families 
consisting not necessarily of basic, or even recursive, functions. 

But if we want to define programming methods based upon Prim, then we 
must decorate global inputs by some basic functions, and interpret the resulting 
decorated tree as as a program producing one concrete recursive function. 

Here the choice becomes ambiguous: we may change the list of basic func- 
tions, and we may allow the application of c,b,r to some restricted class of 
subfamilies, getting the more general cases from trees larger than corollas. 

For c and b, we allowed arbitrary natural families, implicitly using associa- 
tivity of intended interpretations. Yanofsky allows only two inputs. For r, we 
essentially adhered in 5.12 (c) to the choice made by Yanofsky. 


5.15. Prim as a world of programming methods. We now define 
Prim(m,n) as the subset of Prim consisting of graphs whose outputs (roots 
of connected components) have the total arity/coarity (m, 7). 

The evaluation morphism in C 


eV P(m,n) : P(m,n) x (zt)™ = (2°)? 
we have already essentially described. Namely, 


CVE Gan) Figs he). = Fig ae) 


where f, is the total output of the flowchart 7, which we formerly denoted by 
Op(r), applied to the input decorations of rT. 
A computable multiple composition morphism (cf. 3.4 above) 


comp: P(m,—1,m,) X +++ x P(m2,m3) x P(m1,m2) > P(m,,m,) 


can be constructed as follows. For simplicity, we will describe only the composite 
comp (T,7r—1,-+-,71) for an r-tuple of decorated trees 7,72,...,Tr- 

Consider a corolla with vertex decorated by c, r inputs decorated by the 
arities (m1, mg),...,(™M,—1,m,-), and an output decorated by (m1, m,). Graft 
inputs of this corolla to the roots of 71,...,7, respectively. The resulting tree 
represents the composition. 

Of course, on the combinatorial level, we will have to make a stupid choice 
of some “concrete” vertex and flags of this corolla, but the result will be unique 
up to unique isomorphism identical on the component trees 7;. 

However, if we iterate partial compositions that on the level of maps corre- 
spond, say, tohogof, (hog)o f, and ho(go f) respectively, we will get three 
different decorated trees, say 0123, 012,3, 01,23. 

On the combinatorial/geometric level these trees are interconnected by two 
contraction morphisms (cf. 5.11) 012.3 > o123 and 01,23 > 0123 that contract 
the edges entering the root vertices, whose ends are marked by c. One can 
simply declare that such contractions generate an equivalence relation on the 
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elements of Prim, and that algorithms encoded by Prim are actually such 
(or even bigger) equivalence classes rather than isomorphism classes of the dec- 
orated trees. 

However, since we work in a categorical context, and strive to produce a 
category of algorithms in the sense of Definition 3.5, a better way to act is to 
organize Prim into a constructive category, and then to localize it with respect 
to those morphisms tT — o that produce a natural identification Op(r) and 
Op (oc). 

Recall that the localization of a category B with respect to a set of its mor- 
phisms S is a functor L: B > B[S~*] that makes all morphisms in S invertible 
and that is the initial object among all functors with this property. 

Here is a simple version of this construction. 


5.16. Definition—Claim. Consider the category Pr whose set of objects is the 
set Prim, and morphisms are compositions of the following maps of decorated 
graphs: 


(i) Isomorphisms. 

(ii) Contractions of subtrees of the following type: all vertices of such a subtree 
are decorated by c. After the contraction, the resulting vertex must be 
marked by c. The remaining decorations do not change. 

(iii) Contractions of subtrees, all of whose all vertices are decorated by b. After 
the contraction, the resulting vertex must be marked by b. The remaining 
decorations do not change. 


Denote by P the localization of Pr with respect to all morphisms. It has the 
natural structure of a category of programming methods for which composition 
and bracket operations become associative. 

One can similarly accommodate more sophisticated equivalence relations 
between decorated trees, studied by Yanofsky. 

To this end one can extend the category Pr by some extra morphisms, and 
then localize with respect to them as well. 


6 Models of Computation and Complexity 


In this section we are gradually zooming, passing from the macroscopic view 
of the constructive universe to “human scale” to microscopic (Boolean and 
Turing’s) level. 


6.1. Normal models. Let U be an infinite set. In this subsection we will be 
considering partial functions U — U that can be constructed by iteration. In 
other contexts, they might be called dynamical systems with discrete time, or 
cascades. 

A normal model of computation M is the structure (P, U,I, F,s) consisting 
of four sets and a map 


IFCPxU, s:PxU>PxU. 
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Here s is an everywhere defined function such that s(p,u) = (p,sp(u)) for 
any (p,u) € P x U. Intuitively, p is a program, u is a configuration of the 
deterministic discrete-time computing device, and s,(w) is the new configuration 
obtained from u after one unit of time (clock tick). The subset I is that of initial 
data, or inputs. The subset F’ C P x U (final configurations, outputs) must be 
a part of the set of fixed points of s: if (p,u) € F, then s(p,u) = (p,u). 

In this setting, we denote by f, the partial function f, : U — U such that 
we have u € D(fp), fp(u) =v, if and only if 


(p,u) € I, and for some n > 0, (p, s;(u)) € F and s}(u) =v. 


The minimal such n will be called the time (number of clock ticks) needed to 
calculate f,(u) using the program p. 
Any finite sequence 


(D, U, Sp(u),---, 3, (u)), WET, 


will be called a protocol of computation of length m for the model M. 

We now add the constructivity conditions. 

We require P,U to be constructive worlds, s computable. In addition, we 
require I, F to be decidable subsets of P x U. Then f, are computable, and 
protocols of given length (resp. of arbitrary length, resp. or those stopping at F’) 
form constructive worlds. If we denote by Qj, the world of protocols stopping at 
F and by ev: Qu x U + U the map (p,u) > s}°*(u), we get a programming 
method. 

Such a model M is called versal if the respective programming method Q yy 
is versal. 

The notion of normal model of computation includes both normal algorithms 
and Turing machines. 

Consider, for example, the standard description of the constructive world 
T of Turing machines T slightly adapted to our conventions. It includes the 
following data: 


(a) The constructive world U = {0,1}* of, say, binary words that can be written 
on the tape of any T from our world. 

(b) For each T, a finite set of internal states Jr, containing initial state, 
accepting state rejecting state, and remaining intermediate states J%. All 
Jr must be elements of a constructive world of states J, and the map 
T ++ Jr must be computable. 

(c) The computable partial map tT: JxNxU — JxNxU, where N are natural 
numbers (including 0). For each T’, it must send the subset Jr x N x U into 
itself. 


A triple (i,n,u) € JrxNxU is the configuration of T in which T is in state 4, 
and the head is scanning the nth square of the tape (the initial bit of u is counted 
as the first square, the square to the left of it is the zeroth square). The domain 
of definition of rr consists only of those triples for which n < |u| + 1, where 
|u| is the length of u: the head must scan either one of the bits of u, or one 
of the next-door neighbors. The triple rr(,n, u) = (41,71, U1) depicts the next 


6 Models of Computation and Complexity 309 


internal state of the machine, position of the head, and the new word on the 
tape. The usual restrictions on the rr are ny =n +1, and wu; may differ from 
u only at the nth bit. 
The fixed points of 7 are triples for which 7 = accepting or rejecting state. 
We can reduce such a description to our normal form by putting U = {0,1}", 


P:=JxN, I := {initial states} x {1} x U. 


States F’ are those triples (accepting state,n,u) that can be reached from some 
point of J after a finite iteration of 7. Finally, to get an everywhere defined s 
coinciding with 7 on its definition domain, we can extend T to a computable map 
in some trivial way. For example, starting with some triple (¢,n, u) not in I, we 
can prescribe s to move the head to the left until it reaches the first nonempty 
tape square, to continue moving until it reaches the next empty square, and 
then move one square to the right. 

Turing machines have one feature that we did not keep in our definition 
of normal models. It is sometimes called locality of the iteration map, which 
depends only on the restricted number of bits in of the current position and 
changes only a restricted number of bits in moving to the next position. 
Discussing complexity later, we will suggest a useful and sufficiently general 
weakening of this requirement. 


6.2. Boolean circuits. Boolean circuits are classical models of computation 
well suited for studying maps between the finite sets whose elements are encoded 
by binary words. Discussing them, we will identify the alphabet {0,1} with the 
2-element field Fo. 

Consider the commutative polynomial algebra generated over F2 by a count- 
able sequence of independent variables, say 71, 2%2,23,.... Define the Boolean 
algebra B as the quotient algebra of F2[21, 72,...] modulo the ideal generated 
by polynomials x? — x;. Each Boolean polynomial, element of B, determines a 
function on 6%, F»2 with values in F2 = {0,1}. 

We start with the following simple fact. 


6.3. Claim. Any map f : F3’ — F9 can be represented by a unique vector of 
Boolean polynomials. 


ProoF. It suffices to consider the case n = 1. Then this map is surjective, 
because f is represented by 


EG jes yn) = » f(y) [[@+ut). 


y=(yiJEFy a 


In fact, the product at f(y) is the Kronecker delta dz y. 

Moreover, the vector spaces of such maps and of Boolean polynomials over 
F, have the common dimension 2”. In fact, Boolean polynomials are rep- 
resented by linear combinations of monomials 2;, ---x;,, one for each subset 
{i1,...,¢%} C {1,...,m}. This completes the proof. 
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Now we can calculate any vector of Boolean polynomials by iterating 
operations from a small finite list, which is chosen and fixed, eg., B := 
{x,1,v+ y, ry, (x,x)}. Such operators are called classical gates. A sequence 
of such operators, together with an indication of their arguments from the pre- 
viously computed bits, is called a Boolean circuit. The number of steps in such 
a circuit is considered (a measure of) the time of computation. 

As the word circuit suggests, one may consider even better representations 
by flowcharts, which are oriented graphs, with vertices decorated by the names 
of gates. 

When the relevant finite sets are not F4’, and perhaps have a wrong cardi- 
nality (not a power of 2), we encode their elements by finite sequences of bits 
and consider the restriction of Boolean polynomials to the relevant subset. 

As above, a protocol of computation in this model can be represented as the 
finite table consisting of rows (generally of variable length) that accommodate 
sequences of 0’s and 1’s. The initial line of the table is the input. Each subse- 
quent line must be obtainable from the previous one by the application of one 
the basic functions in B to the sequence of neighboring bits (the remaining bits 
are copied unchanged). The last line is the output. The exact location of the 
bits that are changed in each row and the nature of change must be a part of 
the protocol. 

Physically, one can implement the rows as the different registers of the mem- 
ory, or else as the consecutive states of the same register (then we have to make a 
prescription for how to cope with the variable length, e.g., using blank symbols). 


6.4. Turing machines vs. Boolean circuits. Any protocol of the Turing 
computation of a function can be treated as such a protocol of an appropriate 
Boolean circuit, and in this case we have only one register (the initial part of 
the tape) whose states are consecutively changed by the head/processor. We 
will still use the term “gate” in this context. 

A computable function f with infinite domain is the limit of a sequence 
of functions f; between finite sets whose graphs extend each other. A Turing 
program for f furnishes a computable sequence of Boolean circuits, which com- 
pute all f; in turn. Such a sequence is sometimes called uniform. 


6.5. Size, complexity, and polynomial-time computability. The quanti- 
tative theory of computational models deals simultaneously with the space and 
time dimensions of protocols. The preceding subsection focused on time; here 
we introduce space. For Boolean (and Turing machine) protocols this is easy: 
the length of each row of the protocol plus specifications for the next step is 
the space required at that moment. The maximum of these lengths, up to a 
multiplicative constant, bounds the total space required from above and from 
below. 

The case of normal models and infinite constructive worlds U is more 
interesting. 

Generally we will say that a a size function U — N is any function such that 
for every H €N, there are only finitely many objects of size < H. Thus the 
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number of bits |n| = [logyn]+1 and the identical function ||n|| = n are both size 
functions on Zt. Using a numbering, we can transfer them to any constructive 
world. In these two examples, the number of constructive objects of size < H 
grows as expcH, resp. cH. Such a count in more general cases allows one to 
make a distinction between the bit size, measuring the length of a description 
of the object, and the volume of the object. 

In most cases we require computability of size functions. However, there 
are exceptions: for example, Kolmogorov complexity is a noncomputable size 
function with very important properties: see VI.9. 

Given a size function (on all relevant worlds) and a versal normal model of 
computations MW, we can consider the following complexity problems: 


(A) For a given morphism (computable map) f : U — V, estimate the smallest 
bit size Kas(f) of the program p such that f = fy. 


According to V.9, there exists an optimal universal model of computation 
YU such that with P = N and the bit size function, for any other model S there 
exists a constant c such that for any f, 


Ku(f) < Ku(f) +e. 


When U is chosen, Ky(f) is called the Kolmogorov complexity of f. With a 
different choice of U we will get the same complexity function up to O(1)- 
summand. 

This complexity measure is highly nontrivial (and especially interesting) in 
the case of one-point U. It measures, then, the size of the most compressed 
description of a variable constructive object in V. This complexity is quite 
“objective,” being almost independent of arbitrary choices. Being uncom- 
putable, it cannot be directly used in computer science. However, it furnishes 
some basic restrictions on computability, strikingly similar to those provided by 
conservation laws in physics. 

Recall that on N we have Ky(n) < |n| + O(1) = logs||n|| + O(1). The first 
inequality “generically” can be replaced by equality, but infinitely often A7z,(n) 
becomes much smaller that ||. 


(B) For a given morphism (recursive map) f : U — V, estimate the time needed 
to calculate f(u),u € D(f), using the program p and compare the results 
for different p and different models of computation. 


(C) The same for the function “maximal size of intermediate configurations in 
the protocol of the computation of f(u) using the program p” (space, or 
memory). 


In the last two problems, we have to compare functions rather than numbers: 
time and space depend on the size of input. Here a cruder polynomial scale 
appears naturally. Let us show how this happens. 

Fix a computational model S with the transition function s computing 
functions U — U, and choose a bit size function u + |u| on U satisfying 
the following crucial assumption, a weakening of the locality requirement valid 
for Turing machines: 
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(i) Jul —¢ < |s,(u)| < jul +c, where the constant c may depend on p but not 
on U. 


In this case we have |s7"(u)| < |u| + cpm: the required space grows no more 
than linearly with time. 

Let now (S’,s ) be another model such that s, = s, for some q. For example, 
such q always exists if S’ is versal. Assume that s’ satisfies (i) as well, and 
moreover, 


(ii) s can be computed in the model S in time bounded by a polynomial F in 
the bit size of input. 


This requirement is certainly satisfied for Turing and Markov models, and 
is generally reasonable, because an elementary step of an algorithm deserves its 
name only if it is computationally tractable. 

Then we can replace one application of sp to sj*(u) by < F(|u| + em) 
applications of Bs And if we needed T(u) steps in order to calculate f,(u) 
T(u) F 


using S, we will need no more than < 7) ) 


(Ju| + em) steps to calculate 
the same function using S’ and q. In a detailed model, there might be a small 
additional cost of merging two protocols. This is an example of the compilation 
morphism lifted to the worlds of protocols. 

Thus, from the assumptions (i) and (ii) it follows that functions computable 
in polynomial-time by S have the same property for all reasonable models. 
Notice also that for such functions, | f(u)| < G(|u]) for some polynomial G and 
that the domain D(f) of such a function is decidable: if after T(|u|) iterations 
of s, we are not in a final state, then u ¢ D(f). 

Thus we can define the class PF of functions, say N* — N, computable in 
polynomial-time using a fixed universal Turing machine and arguing as above 
that this definition is model-independent. 

If we want to extend it to a constructive universe C, however, we will 
have to postulate additionally that any constructive world U comes together 
with a natural class of numberings that together with their inverses are com- 
putable in polynomial-time. The bit size will be defined in terms of one of these 
numberings. 

This postulate, accepted for “all constructive worlds,” seems to be a part of 
the content of the “polynomial Church thesis” invoked by M. Freedman in his 
talk at the Berlin ICM, 1998. 

If we take this strengthening of Church is thesis for granted, and take two 
bit-size functions determined by two polynomial numberings, then the quotient 
of two such size functions is bounded from above and away from zero. 

Below we will be considering only the universes C and worlds U with these 
properties, and |u| will always denote a computable bit size. Gédel’s numbering 
for N x N shows that that such C is still closed with respect to finite prod- 
ucts. (Notice, however, that the beautiful numbering of N* using primes is not 
polynomial-time computable; it may be replaced by another one that is in PF). 
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6.6. P/NP problem. Let U be a constructive world. By definition, a subset 
E CU belongs to the class P if its characteristic function yg (equal to 1 on E 
and 0 outside) belongs to the class PF. 

Furthermore, E Cc U belongs to the class NP if there exists a subset E’ C 
U x V belonging to P and a polynomial G such that 


u€E <> J(u,v) €E with |v| < G(lu)). 


Here V is another constructive world (which may coincide with U). We will say 
that E is obtained from E” by a polynomially truncated projection. 

Such a v can be called a witness of the inclusion u € E. The polynomial-time 
calculation establishing that x,’ (u,v) = 1 is a short proof that u € E. 

The discussion above establishes in what sense this definition is model- 
independent. 

Clearly, PC NP. 

The question whether these two classes coincide is the celebrated P/NP 
problem. 

A naive algorithm calculating yz from x, by searching for v with |v| < 
G(\ul) and x,” (u,v) = 1 will generally take exponential time v (because |u| is 
a bit-size function). Of course, if one can treat all such v simultaneously, using 
massive parallellism, the required time will be polynomial: time will be traded 
for space. Or else, if an oracle tells you that u € EF and supplies an appropri- 
ate v, you can convince yourself that this is indeed so in polynomial-time, by 
computing x,’ (u,v) = 1. 

Notice that enumerable sets can be alternatively described as projections of 
decidable ones, and that in this context projection does create undecidable sets. 
Nobody as yet has been able to translate the diagonalization argument used to 
establish this to the P/NP domain. 

It has long been known that the P/NP problem can be reduced to checking 
whether some very particular sets—N P-complete ones—belong to P. 


6.7. Definition. The set E Cc U is called NP-complete if, for any other set 
D Cc V,D € NP, there exists a function f : V — U,f © PF, such that 
D = f-*(B), that is, yo(v) = xe(f(v)). 

We will sketch the classical argument (due to S. Cook, L. Levin, R. Karp) 
showing the existence of NP-complete sets. In fact, the reasoning is construc- 
tive: it furnishes a polynomially computable map producing f from the descrip- 
tions of x, and the truncating polynomial G. 

In order to describe one NP-complete problem, we will define an infinite 
family of Boolean polynomials b,, indexed by the following data, constituting 
objects u of the constructive world U. One uw is a collection 


meéEN; (51,71), Eee (Sn, Tn), 
where $j, T; C {1,...,m}, and b,, is defined as 


N 


bulor,---5%m)=]] (1+ [] @tex) T] 2 


i=1 kes; jET; 
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We choose the bit size of u as |u| = mN. 
Put 
E={ueU|aveF, bi (v) = 1}. 


Using the language of Boolean truth values, one says that v satisfies b, if 
bu(v) = 1, and F is called the satisfiability problem, or SAT. 


6.8. Proposition. SAT € NP. 


PROOF. In fact, let 
E’ = {(u,v) |bu(v) = 1} CU x (@%,F2). 


Clearly, E is the full projection of E’. A bit of contemplation will convince the 
reader that E’ € P. In fact, we can calculate bu(v) performing O(Nm) Boolean 
multiplications and additions. The projection to E can be replaced by a polyno- 
mially truncated projection, because we have to check only v of bit size |v| < m. 


6.9. Theorem. SAT is NP-complete. 


ProoF (sketch). In fact, let D ¢ NP, D C A, where A is some constructive 
world. Take a representation of D as a polynomially truncated projection of 
some set D’ Cc Ax B,D’ € P. Choose a normal, say Turing, model of com- 
putation and consider the Turing protocols of computation of x(a, b) with 
fixed a and variable polynomially bounded b. As we have explained above, for 
a given a, any such protocol can be imagined as a table of a fixed polynomially 
bounded size whose rows are the consecutive states of the computation. In the 
“microscopic” description, the positions in this table can be filled only by 0 or 1. 
In addition, each row is supplied by the specification of the position and the 
inner state of the head/processor. Some of the arrangements are valid protocols, 
others are not, but the local nature of the Turing computation allows one to 
produce a Boolean polynomial b,, in appropriate variables such that the valid 
protocols are recognized by the fact that this polynomial takes value 1. This 
defines the function f reducing D to E. The construction is so direct that the 
polynomial-time computability of f is straightforward. 

Many natural problems are known to be NP-complete, in particular 
3-SAT. It is defined as the subset of SAT consisting of those u for which 
card (S$; UT;) = 3 for all i. 


6.10. Remark. Most Boolean functions are not computable in polynomial-time. 
Several versions of this statement can be proved by simple counting. 

First of all, fix a finite basis B of Boolean operations as in 6.3, each acting 
on < a bits. Then sequences of these operations of length t generate O((bn“)*) 
Boolean functions F} — F3, where b = card B. On the other hand, the number 
of all functions 2”2” grows as a double exponential of n and for large n cannot 
be obtained in time t polynomially bounded in n. 

The same conclusion holds if we consider not all functions but only permu- 
tations: Stirling’s formula for card Sgn = 2”! involves a double exponential. 
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Here is one more variation of this problem: define the time complexity of a 
conjugacy class in Sgn as the minimal number of steps needed to calculate some 
permutation in this class. This notion arises if we are interested in calculating 
automorphisms of a finite world of cardinality 2” that is not supplied with a 
specific encoding by binary words. Then it can happen that a judicious choice of 
encoding will drastically simplify the calculation of a given function. However, 
for most functions we still will not be able to achieve polynomial-time com- 
putability, because the asymptotic formula for the number of conjugacy classes 
(partitions) 


exp G 2(2” — x) 
p(2”") 6 — + 


again displays double exponential growth. 


7 Basics of Quantum Computation I: Quantum 
Entanglement 


In this section we will discuss the basics: how to use the superposition principle 
in order to accelerate (certain) classical computations. 

For a minimal physics background, the reader may wish to reread II. 
12.1-12.9. 


7.1. Description of the problem. Let N be a large number, F : {0,..., 
N-1}-— {0,...,N—1} a function such that the computation of each particular 
value F'(a) is tractable, that is, can be done in time polynomial in log x. We want 
to compute (to recognize) some property of the graph (x, F(x)), for example: 


(i) Find the least period r of F, i.e., the least residue rmodN such that 
F(x+rmod N) = F(x) for all x (the key step in the factorization problem.) 

(ii) Find some x such that F(x) = 1 or establish that such a does not exist 
(search problem.) 


As we already mentioned, a direct attack on such a problem consists in com- 
piling the complete list of pairs (x, F(a)) and then applying to it an algorithm 
recognizing the property in question. Such a strategy requires at least exponen- 
tial time (as a function of the bit size of N), since already the length of the 
list is N. Barring a theoretical breakthrough in understanding such problems 
(for example a proof that P=NP), a practical response might be in exploiting 
the possibility of parallel computing, i.e., calculating simultaneously many—or 
even all—values of F'(x). This takes less time but uses (dis)proportionally more 
hardware. 

A remarkable suggestion due to D. Deutsch consists in using a quantum 
superposition of the classical states |x) as the replacement of the union of N 
classical registers, each in one of the initial states |x). To be more precise, here 
is a mathematical model formulated as a definition. 
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7.2. Quantum parallel processing: version I. Keeping the notation above, 

assume moreover that N = 2”. 

(i) The quantum space of inputs/outputs is the 2"-dimensional complex Hilbert 
space H,, with the orthonormal basis |x), 0 < « < N —1. Vectors |x) are 
called classical states. 

(ii) The quantum version of F is the unique unitary operator Ur: H, > Hy, 
such that Up|x) = |F(a)). 

Quantum parallel computing of F is (a physical realization of) a quantum 
system with the state space H, and the evolution operator Up. 

Naively speaking, if we apply Ur to the initial state which is a superposition 
of all classical states, with, say, equal amplitudes, we will get simultaneously all 
classical values of F (i.e., their superposition): 


Ue (=rM) = eo). 


Now, this does not look very promising. In fact, Ur exists only if F is a 
permutation, and in this case the left hand side is simply identical to the right- 
hand side! 

To get a more workable version, we will have to take superpositions with 
different weights. We will also have to devise tricks for replacing, say, search 
functions (1 on desirable elements, 0 elsewhere) by permutations. For this, see 
Section 7.3 below. 

For the time being, we will start discussing various issues related to our 
preliminary picture, before passing to its more realistic modification. 


(A) We put N = 2” above because we are imagining the respective classical 
system as an n-bit register: cf. the discussion of Boolean circuits. Every 
number 0 < a < N—1 is written in the binary notation x = )0, €;2’ and is 
identified with the pure (classical) state |é,_1,...,€9), where e; = 0 or 1 is 
the state of the ith register. The quantum system Hj is called a qubit. We 
have Hy, = H®", |en—1,---,€0) = |€n—1) ®-- ® leo). 


This conforms to the general principles of quantum mechanics. The Hilbert 
space of the union of systems can be identified with the tensor product of the 
Hilbert spaces of the subsystems. Accordingly, decomposable vectors correspond 
to the states of the compound for which one can say that the individual subsys- 
tems are in definite states. 

In a general state of the register, the individual bits do not store any definite 
values: this is the essence of quantum entanglement. 


(B) Pure quantum states, strictly speaking, are points of the projective space 
P(H,,), that is, complex lines in H,,. Traditionally, one considers instead 
vectors of norm one. This leaves undetermined an overall phase factor 
expiy. If we have two state vectors, individual phase factors have no 
objective meaning, but the difference of their phases does have one. This 
difference can be measured by observing effects of quantum interference. 


Quantum interference is highly important and is used for implementing 
efficient quantum algorithms. 
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(C) Ifa quantum system S is isolated from its environment, its dynamical evo- 
lution with time t is described by the unitary operator acting on its Hilbert 
space, U(t) = expiHt, where H is the Hamiltonian, t is time. Therefore 
one option for implementing Ur physically is to design a device for which 
Ur would be a fixed time evolution operator. However, this seemingly con- 
tradicts many deeply rooted notions of the algorithm theory. For example, 
calculating F(x) for different inputs x takes different times, and it would 
be highly artificial to try to equalize them already in the design. 


Instead, one can try to implement Up as the result of a sequence of brief 
interactions, carefully controlled by a classical computer, of S with the envi- 
ronment (say, laser pulses). Mathematically speaking, Ur is represented as a 
product of some standard unitary operators U;,,...,U each of which acts only 
on a small subset (two, three) of classical bits. These operators are called quan- 
tum gates. 

The complexity of the respective quantum computation is determined by its 
length (the number m of the gates) and by the complexity of each of them. 

The latter point is a subtle one: continuous parameters, e.g., phase shifts, 
on which U; may depend, makes the information content of each U; potentially 
infinite and leads to a suspicion that a quantum computer will in fact perform 
an analog computation, only implemented in a fancy way. 

This point has been discussed and refuted on several occasions by displaying 
those features of quantum computation that distinguish it from both analog 
and digital classical information processing. Philosophically, all arguments are 
variations on the theme of von Neumann’s theorem on the impossibility of 
hidden parameters (cf. IT.12). 

One more problem related to the necessity to renounce the image of an 
isolated quantum register is that of stability, or fault tolerance. Even very weak, 
but uncontrolled, interactions with the environment will quickly lead to the 
spreading of quantum noise, destroying the useful information. This is called 
quantum decoherence. 

One defense strategy is the technique of fault-tolerant computation using 
quantum codes for producing continuous variables highly protected from exter- 
nal noise. 


7.3. Reducing general functions to permutations. As we have already 
remarked, the requirement that F’ must be a permutation is highly restrictive: 
for instance, in the search problem F' takes only two values. 

There is nothing justifying this restriction in the schemes of classical com- 
putation, but in our quantum model, only permutations F’ extend to unitary 
operators (“quantum reversibility”). 

The standard way out consists in introducing two n-bit registers instead of 
one, for keeping the value of the argument as well as that of the function. This 
also conforms with our initial idea that we want to learn something about the 
graph of F. 

More precisely, if F'(|x)) is an arbitrary function of classical bits, we can 
replace it by the permutation F(a, y)) := |x, F(x) ®y), where @ is the Boolean 
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(bitwise) sum. This involves no more than a polynomial increase of the classical 
complexity, and the restriction of F to y = 0 produces the graph of F’, which 
we need anyway for the type of problems we are interested in. 

In the quantum Boolean circuit version this trick must be applied to all 
gates. 

More precisely, in order to process a classical algorithm (sequence of Boolean 
gates) for computing F' into the quantum one, we replace each classical gate 
by the respective reversible quantum gate, i.e., by the unitary operator corre- 
sponding to it tensored with the identical operator. Besides two registers for 
keeping |a) and F'(|x)) we will have to introduce as well extra qubits in which 
we are not particularly interested. The corresponding Hilbert space and its con- 
tent is sometimes referred to as “scratchpad,” “garbage,” etc. Besides ensuring 
reversibility, additional space and garbage can be introduced as well for con- 
sidering functions F’: {0,...,N —1} — {0,...,M—1}, where N, M are not 
powers of two (then we extend them to the nearest power of two). For more 
details, see the next section. 

Notice that the choice of gate array (Boolean circuit) as the classical model 
of computation is essential in the following sense: a quantum routine cannot 
use conditional instructions. Indeed, to implement such an instruction we must 
observe the memory in the midst of calculation, but the observation generally 
will change its current quantum state. 

In the same vein, we must avoid copying instructions, because the classical 
copying operator |”) — |x) ® |x) is not linear. In particular, each output qubit 
from a quantum gate can be used only in one gate at the next step (if several 
gates are used in parallel): cloning is not allowed. 

These examples show that the basics of quantum code writing will have a 
very distinct flavor. 

We now pass to the problems posed by the input/output routines. 

Input, or initialization, in principle can be implemented in the same way as 
a computation: we produce an input state starting, e.g., from the classical state 
|0) and applying a sequence of basic unitary operators: see the next section. 
Output, however, involves an additional quantum-mechanical notion: that of 
observation. 


7.4. Quantum observation. The simplest model of observation of a quantum 
system with the Hilbert space H is that of interaction with another system, 
and their subsequent disentanglement. 

Possible results of such an interaction will form an orthonormal basis |x;) 
of H (depending on the physical details of observation). If our system was in 
some entangled state |W) at the moment of observation, it will be observed in 
some state |y;) with probability |(xi| w)|?. 

This means first of all that every quantum computation is inherently prob- 
abilistic. Observing (a part of) the quantum memory is not exactly the same 
as “printing the output.” We must plan a series of runs of the same quantum 
program and the subsequent classical processing of the observed results, and 
we can hope only to get the desired answer with probability close to one. 
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Furthermore, this means that by implementing quantum parallelism simple- 
mindedly as at the beginning of this section, and then observing the memory 
as if it were the classical n-bit register, we will simply get some value F(x) with 
probability 1/N. This does not use the potential of the quantum parallelism. 
Therefore we formulate a corrected version of this notion, allowing more flexibil- 
ity and stressing the additional tasks of the designer, each of which eventually 
contributes to the complexity estimate. 


7.5. Quantum parallel processing: version II. To solve efficiently a problem 
involving properties of the graph of a function F', we must design: 


(t) An auziliary unitary operator U carrying the relevant information about 
the graph of F. 
(it) A computationally feasible realization of U with the help of standard quan- 
tum gates. 
(itt) A computationally feasible realization of the input subroutine. 
(iv) A computationally feasible classical algorithm processing the results of 
many runs of quantum computation. 


All of this must be supplemented by quantum error-correcting encoding, 
which we will not address here. In the next section we will discuss some standard 
quantum subroutines. 


8 Selected Quantum Subroutines 


8.1. Initialization. Using the same conventions as in Section 7 and the subse- 
quent comments, in particular the identification H,, = HP", we have 


So eee y= (Fti+1) 
VN 2a cs VN ae = —- v2 


In other words, 


N-1 

1 n—1 0 
a> jz) =U... |0---0), 
Y x=0 


where U, : H, — Hy, is the unitary operator 


1 1 
— (|0) + |1)), |1) R —= (0) —|1)), 
aver tae Jy lO — 1h) 
and U\) =id@---@U, @---@ id acts only on the ith qubit. 

Thus making the quantum gate U; act on each memory bit, one can in 
n steps initialize our register in the state that is the superposition of all 2” 
classical states with equal weights. 


|0) > 


8.2. Quantum computations of classical functions. Let 6 be a finite basis 
of classical gates containing the one-bit identity and generating all Boolean 
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circuits, and fF: F3’ — F4 a function. We will describe how to turn a Boolean 
circuit of length LZ calculating F' into another Boolean circuit of comparable 
length consisting only of reversible gates, and calculating a modified function, 
which, however, contains all information about the graph of F. Reversibility 
means that each step is a bijection (actually, an involution) and hence can be 
extended to a unitary operator, that is, a quantum gate. For a gate f, define 
f(\x,y)) = |x, f(x) + y) as in 7.3 above. 

8.3 Claim. A Boolean circuit S of length L in the basis B can be processed 
into the reversible Boolean circuit S of length O((L + m+ n)?) calculating a 
permutation FH : Ree => Rees with the following property: 


H(x,y,0) = (a, F(x) + y,0) = (F(a, y),0). 
Here x,y,z have sizes m,n, L respectively. 


Proor. We will understand L here as the sum of sizes of the outputs of all 
gates involved in the description of S. We first replace in S each gate f by 
its reversible counterpart f. This involves inserting extra bits, which we put 
side by side into a new register of total length L. The resulting subcircuit will 
calculate a permutation K : FY +" = FP” such that K(a,0) = (F(x), G(2)) 
for some function G (garbage). 

Now add to the memory one more register of size n keeping the variable y. 
Extend K to the permutation K : Fyt’t™ — FY+“*" keeping y intact: 
K : (x,0,y) (F(z), G(x), y). Clearly, K is calculated by the same Boolean 
circuit as K, but with extended register. 

Extend this circuit by the one adding the contents of the first and the 
third registers: (F(x),G(a),y) > (F(x), G(x), F(a) + y). Finally, build the 
last extension that calculates K and consists of reversed gates calculating 
K in reverse order. This clears the middle register (scratchpad) and produces 
(x,0, F(a) + y). The whole circuit requires O(L + m+n) gates if we allow 
the application of them to not necessarily neighboring bits. Otherwise we must 
insert gates for local permutations, which will replace this estimate by O((L + 
m+n)). 


8.4. Fast Fourier transform. Finding the least period of a function of one 
real variable can be done by calculating its Fourier transform and looking at 
its maxima. The same strategy is applied by Shor in his solution of the factor- 
ization problem. We will show now that the discrete Fourier transform ®,, is 
computationally easy (quantum polynomial-time). We define ®, : H, — H,, 
by 


N-1 
&,,(|2)) = a > |e) exp (2icx/N). 
c=0 


In fact, it is slightly easier to implement directly the operator 


N-1 


S- \c’) exp (2rica/N), 


#4(2)) = 
c=0 
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where c’ is c read from right to left. The effects of the bit reversal can then be 
compensated at a later stage without difficulty. 

Let ust) : Hy, — Hy, k <j, be the quantum gate that acts on the pair of 
the kth and jth qubits in the following way: it multiplies |11) by exp (i7/2/-*) 
and leaves the remaining classical states |00),|01),|10) intact. 


8.5. Lemma. We have 


n—-1 n—1 
k k 
k=0 j=k+1 


By our rules of the game, this presentation has polynomial length in the 
sense that it involves only O(n?) gates. However, implementation of ust ) 
requires controlling variable phase factors that tend to 1 as k — 7 grows. More- 
over, arbitrary pairs of qubits must allow quantum-mechanical coupling, so that 
for large n, the interaction between qubits must be nonlocal. The contribution 
of these complications to the notion of complexity cannot be estimated without 
going into the details of the physical arrangement. Therefore we will add a few 
words on this subject. 

One possible implementation of a quantum register consists of a collection of 
ions (charged atoms) in a linear harmonic trap (optical cavity). Two of the elec- 
tronic states of each ion are denoted by |0) and |1) and represent a qubit. Laser 
pulses transmitted to the cavity through the optical fibers and controlled by 
the classical computer are used to implement gates and readout. The Coulomb 
repulsion keeps ions apart (spatial selectivity), which allows the preparation of 
each ion separately in any superposition of |0) and |1) by timing the laser pulse 
properly and preparing its phase carefully. The same Coulomb repulsion allows 
for collective excitations of the whole cluster, whose quanta are called phonons. 
Such excitations are produced by laser pulses as well under appropriate reso- 
nance conditions. The resulting resonance selectivity combined with the spatial 
selectivity implements a controlled entanglement of the ions that can be used 
in order to simulate two- and three-bit gates. 

Another recent suggestion is to use a single molecule as a quantum register, 
representing qubits by nuclear spins of individual atoms, and using interac- 
tions through chemical bonds in order to perform multiple-bit logic. The clas- 
sical technique of nuclear magnetic resonance developed since the 1940s, which 
allows one to work with many molecules simultaneously, provides the startup 
technology for this project. 


8.6. Quantum search. All the subroutines described up to now have boiled 
down to some identities in the unitary groups involving products of not too 
many operators acting on subspaces of small dimension. They did not involve 
output subroutines and therefore did not “compute” anything in the traditional 
sense of the word. We will now describe the beautiful quantum search algorithm 
due to L. Grover, which produces a new identity of this type, but also demon- 
strates the effect of observation and the way one can use quantum entanglement 
in order to exploit the potential of quantum parallelism. 
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We will treat only the simplest version. Let fF: F3 — {0,1} be a function 
taking the value 1 at exactly one point 29. We want to compute x9. We assume 
that F is computable in polynomial-time, or else that its values are given by an 
oracle. Classical search for xo requires on the average about N/2 evaluations of 
F where N = 2”. 

In the quantum version, we will assume that we have a quantum Boolean 
circuit (or quantum oracle) calculating the unitary operator H,, > Hy, 


Ip: |x) 4 e™F@)|z), 


In other words, I is the reflection inverting the sign of |x) and leaving the 
remaining classical states intact. 
Moreover, we put J = —I5, where 6: F} — {0,1} takes the value 1 only at 


0, and V =U")... as in 8.1. 


8.6. Claim. (i) The real plane in H,, spanned by the uniform superposition € of 
all classical states and by |%o) is invariant with respect to T := VJIV Ip. 


(ii) T restricted to this plane is the rotation (from € to |ao)) by the angle pn, 


where 
VN-1 


2 
cosyy =1—-=, sinyn = 2 N 


The check is straightforward. 

Now, yw is close to 2/N, and for the initial angle y between € and |z9) 
we have 

oon 
cos TN 
Hence in [y/yy] © tVN/4 applications of T to € we will get the state very 
close to |a 9). Stopping the iteration of T after as many steps and measuring 
the outcome in the basis of classical states, we will obtain |a 9) with probability 
very close to one. 

One application of T replaces in the quantum search one evaluation of F. 
Thus, thanks to quantum parallelism, we achieve a polynomial speedup in com- 
parison with the classical search. The case in which F takes the value 1 at several 
points and we want to find only one of them can be treated by an extension 
of this method. If there are n such points, the algorithm requires about ,/N/n 
steps, and n need not be known a priori. 

Still, this does not help solving NP-complete problems, because the square 
root of an exponential is still an exponential. 


9 Shor’s Factoring Algorithm 


Efficient factorization of large integers became in the last decades an important 
applied problem, because standard public key cryptosystems rely on the per- 
ceived difficulty of this problem. At least in 2000, it was practically impossible 
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to factorize a product of two 150-decimal-digit primes: estimated running times 
of the best existing factorization algorithms were in the billions years. 

Producing such public key cryptosystems on an industrial scale requires 
mass production of large primes. This last problem recently was shown to be in 
the class P (M. Agrawal, N. Kayal, N. Saxena). Existing practical algorithms 
can prove primality of a 10000-bit number in several weeks. 

For this reason, when P. Shor demonstrated that a quantum algorithm can 
efficiently solve the factorization problem, and thus provide means for system- 
atically breaking the public key cryptosystems, his discovery attracted much 
public attention. We will sketch his algorithm in this section. 


9.1. Notation. Let M be a natural number to be factored. We will assume that 
it is odd and is not a power of a prime number. 

Denote by N the volume of the basic memory register we will be using 
(not counting scratchpad). Its bit size n will be about twice that of 7. More 
precisely, choose M? < N = 2” < 2M”. Finally, let 1 <t < M be a random 
parameter with gcd (t, M) = 1. This condition can be checked classically in time 
polynomial in n. 

Below we will describe one run of Shor’s algorithm, in which ¢ (and of course, 
M, N) is fixed. Generally, polynomially many runs will be required, in which 
the value of t can remain the same or be chosen anew. This is needed in order 
to gather statistics. Shor’s algorithm is a probabilistic one, with two sources of 
randomness that must be clearly distinguished. One is built into the classical 
probabilistic reduction of factoring to the finding of the period of a function. 
Another stems from the necessity of observing quantum memory, which, too, 
produces random results. 

More precise estimates than those given here show that a quantum com- 
puter that can store about 3n qubits can find a factor of M in time of order 
n° with probability close to 1. On the other hand, it is widely believed that no 
recursive function of the type M+ a proper factor of M belongs to PF. 


9.2. A classical algorithm. Put 
r:=min{p|t? =1modM}, 


which is the least period of F: at+t*mod M. 


Claim. If one can efficiently calculate r as a function of t, one can find a proper 
divisor of M in time polynomial in log,M with probability >1—M—"™ for any 
fixed m. 

In fact, choose such t for which the period r satisfies 


r =0mod2, t’/? 4 -1mod M. 


Then gcd (¢"/ 241, M) is a proper divisor of M. Notice that gcd is computable 
in polynomial-time. 

The probability that this condition holds is > 1 — 1/2*~1, where k is the 
number of different odd prime divisors of M, hence > 1/2 in our case. Therefore 


324 IX Constructive Universe and Computation 


we will find a good t with probability > 1——™ in O(log M) tries. The longest 
calculation in one try is that of t”/?. The usual squaring method performs this 
in polynomial-time as well. 


9.3. Quantum algorithm calculating r. Here we describe one run of the 
quantum algorithm that purports to compute r, given M, N,t. We will use the 
working register that can keep a pair consisting of a variable 0 < a < N—1 and 
the respective value of the function t* mod M. One more register will serve as the 
scratchpad needed to compute |a,t* mod M) reversibly. When this calculation 
is completed, the content of the scratchpad will be reversibly erased: cf. 8.3 
above. In the remaining part of the computation the scratchpad will no longer 
be used, so we may decouple it and forget about it. 

The quantum computation consists of four steps, three of which were 
described in Section 8: 


(i) Partial initialization produces from |0,0) the superposition 
, wi 
Ti » |a, 0). 
(ii) Reversible calculation of F’ processes this state into 
N-1 


1 
VN ‘> la,t° mod M). 
a=0 


(iii) Partial Fourier transform then furnishes 


N-1N-1 
exp (27iac/N) |c,t* mod M). 


(iv) The last step is the observation of this state with respect to the system of 
classical states |c,mmodM). This step produces some concrete output 


\c, t* mod M) 


with probability 


+ oa exp (27iac/N) 


a:t%=tk mod M 


The remaining part of the run is assigned to the classical computer and consists 
of the following steps. 


in e best approximation (in lowest terms) to = with denominator 
A) Find the best t l tt t Nv 


r<M<VWJN: 
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As we will see below, we may hope that r will coincide with r in at least 
one run among at most polynomially many. For this reason, we will try r in 
the role of r right away: 


(B) Ifr =0mod 2, calculate gcd (e" /? +1, Mf). 
Ifr is odd, or if r is even, but we did not get a proper divisor of 1/7, repeat 


the run O(loglog M) times with the same t. In case of failure, change ¢ and 
start a new run. 


9.4. Justification. We will now show that given t, from the observed values 
of |c,t* mod M) we can find in O(loglog M) runs the correct value of r with 
probability close to 1. 

Let us call the observed value of c good if 


r Tr 
33 


le|- iF re=ImodN. 


In this case there exists d such that 


ee ee eee 
2 
so that 
Cc d 1 
N 2N- 


Hence if c is good, then r’ found in 9.3 (A) in fact divides r. 

Now call ¢ very good if r=r. 

Estimating the exponential sum in 9.3 (iv), we can easily check that the 
probability of observing a good ¢ is > 1/3r?. On the other hand, there are 
ry(r) states |c,t* mod M) with very good c. Thus to find a very good ¢ with 
high probability, O(r? log r) runs will suffice. 


10 Kolmogorov Complexity and Growth of Recursive 
Functions 


Consider general functions f : Z* — Z*. Computability theory uses several 
growth scales for such functions, of which two are most useful: f may be maj- 
orized by some recursive function (e.g., when it is itself recursive), or by a 
polynomial (e.g., when it is computable in polynomial-time). Linear growth 
does not seem particularly relevant in this context. However, this impression 
is quite misleading, at least if one allows one most important uncomputable 


reordering of Zt. In fact, we make the following claim: 


10.1. Claim. There exists a permutation K : Zt — Z+ such that for any 
partially recursive function f : N — N there exists a constant c with the 
property 

Ko foK7!(n) < en for all n € K(D(f)). 
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Moreover, K is bounded by a linear function, but K~! is not bounded by any 
recursive function. 


PROOF. We will use the Kolmogorov complexity measure of integers, as was 
explained in VI.9. We first recall its definition. 

For arecursive function u: Zt — Z*, x € Zt, put C,(x) := min {k| f(k) = 
x}, or oo if such k does not exist. Call such a function u optimal if for any other 
recursive function v, there exists a constant C,,, such that Cy(@) < Cu,»Cy(x) for 
all x. Optimal functions do exist (see Theorem VI.9.2); in particular, they take 
all positive integer values (however, they certainly are not everywhere defined). 
Fix one such u and call C,,(#) the (exponential) complexity of x. By definition, 
K = K,, rearranges Z* in order of increasing complexity. In other words, 


K(x) := 1 + card {y| Cu(y) < Cu(x)}. 


We first show that 
K(2) = exp(O(1)) Cx (2). 


Since C,, takes each value at most once, we have K(n) < C,,(n). In order to 
show that C,,(a) < cK(a) for some c it suffices to check that 


card{k < N|da, Cy(z) =k} > ON 


with some b > 0. In fact, at least half of the numbers « < N have complexity 
that is no less than 2/2. 

Now, VI.9.7(b) implies that for any recursive function f and all « € D(f), 
we have C,,(f(x)) < const C,,(x). Since C,,(x) and K(x) have the same order 
of growth up to a bounded factor, our claim follows. 


10.2. Corollary. Denote by St°° be the group of recursive permutations of Z*. 
Then K St°°K~! is a subgroup of permutations of no more than linear growth. 
Actually, appealing to Proposition VI.9.6, one can _ considerably 
strengthen this result. For example, let o be a recursive permutation, o* = 
KoK~!. Then o* (2) < cx, so that (o*)"(x) < cx for n > 0. But actually the 
last inequality can be replaced by 
(o%)"(x) Sen 
for a fixed x and variable n. With both x and n variable one gets the estimate 
O(an log (xn)). 

Recall that finite permutations appear in the quantum versions of Boolean 
circuits, because we must treat any function with the help of an appropriate 
unitary operator: cf. the discussion in 7.3 above. 

For the same reason, infinite (computable) permutations might naturally 
appear in models of quantum Turing machines and normal computation mod- 
els. In fact, if one assumes that the transition function s is a permutation, and 
then extends it to the unitary operator U, in the infinite-dimensional Hilbert 
space, one might be interested in studying the spectral properties of such 
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operators. But the latter depend only on the conjugacy class. Perhaps the 
universal conjugation Ux will be a useful theoretical tool in this context. 


10.3. Final comments. Finally, we would like to comment on the hidden role 
of Kolmogorov complexity in the real life of classical computing. 

The point is that in a sense (which is difficult to formalize), we are interested 
only in the calculation of sufficiently nice functions, because a random Boolean 
function will have (super)exponential complexity anyway. 

A nice function, at the very least, has a short description and therefore 
a small Kolmogorov complexity. Thus, dealing with practical problems, we 
actually work not with small numbers, graphs, circuits, ..., but rather with 
an initial segment of the respective constructive world reordered with the help 
of K. We systematically replace a large object by its short description. 

But then the “natural operations” that can be performed on our objects lose 
computability when we have replaced the objects by their short descriptions. 

This inherent tension, incompatibility of shortest descriptions with most- 
economic algorithmic processing, is the central issue of any computation theory. 

The place-value notation of numbers that played such a great role in the 
development of human civilizations is the ultimate system of short descriptions 
that bridges the abyss. Kolmogorov complexity goes far beyond this point. 


xX 
Model Theory 


Model theory the part of logic that studies structures (in Bourbaki’s sense) in 
relation to their descriptions in formal languages, usually first-order ones. The 
study of structures and classes of structures is essentially a subject of algebra 
or universal algebra, but model theory is different in its approach in that it 
places a special emphasis on the question of language and definability in the 
structures. This approach has paid off with applications in various parts of 
concrete mathematics. 


1 Languages and Structures 


Given a language L, an L-structure (or just structure) is essentially the same 
thing as an interpretation of LZ as explained in Section II.2. But the stress now 
is rather on algebra than on logic, so instead of the notation ¢, which realized 
the interpretation of the symbols of L in a set A, we will refer to the structure 
A = (A,L), which provides an interpretation for the symbols of L. We write, 
eg., A = (A,4,-,0,1) when L = {+,-,0,1}. We call A the domain of the 
structure A. 

Unless stated otherwise, we deal in this chapter with first-order languages. 
For an L-formula P one writes A F P to say that the value of P under the 
interpretation is “true.” Usually, in the above notation we will assume that P 
is a sentence, that, is a formula with no free variables. 

According to this notation, Ty,L of I.6.1, for an interpretation @ of L, 
becomes 


Th(A) :={P: AF P}, 
the theory of structure A, where A is the structure given by ¢. 

Often, for a formula P(x1,...,%p) with free variables 2 ,...,2%, and 
elements a1,...,@n € A we say A F P(ay,...,@n), meaning that we have 
extended the interpretation given by A to the interpretation of variables 
Li b> Qj. 

We also assume, as is standard in model theory, that every language contains 
the symbol = and its interpretation is always equality, that is, structures are 
normal models. 
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1.1 Embeddings. If A and B are L-structures with domains A and B 
correspondingly, an embedding h: A — B is a map A — B that preserves 
the symbols of relations, operations, and constants of L, that is, 


(i) for any n-ary relation symbol p € LE and ay,...,an € A, 
AF p(ai,...,@n) iff BE p(h(a1),...,h(an)); 
(ii) for any n-ary operation symbol f € JL and a,...,dn,a € A, A 
f(a1,-+-54m) = iff BE f(h(a4),.--, h(an)) = h(a); 
(iii) for any constant symbol c € L anda € A, AF cA =a iff BE cB = h(a), 
where c“ stands for the interpretation of c in the structure A. 


1.2 Exercise. Any embedding is injective. 
A surjective embedding is called an isomorphism. 


1.3 Definable sets. Recall that for an L-structure A and an L-formula 
P(a1,...,%n) one defines (definition II.2.8) the set 

P(A) = {ae A”: AF P(a)}. 
Sets of this form are called definable. 


Since any subset of A” can be viewed as an n-ary relation, P(v) determines 
also an L-definable relation. If a P(A) coincides with a graph of an operation 
f : A"! — A, we say then that f is an L-definable operation. 


1.4 Exercise. 


(i) An embedding h: A — B of L-structures preserves atomic L-formulas, 
i.e., for any atomic P(a1,...,%) for any GE A”, 


AF P(a) iff BE P(h(a)). 


(ii) Given an V-formula P(a@), that is, one of the form Vx -+-Vam Q(@1,.--,2m, 4G) 
with Q quantifier-free, and an embeddingh: A— B,a@ in A, 


BE P(@) implies AF P(h(a)). 


(iii) An isomorphism h: A — B between L-structures preserves any L-formula 
P(ax1,...,%n), t.e., for any ae A”, 


AF P(a) iff BE P(h(a)). 


1.5 Corollary. For definable subsets (relations) 


in particular, definable subsets in a given structure A are invariant under the 
action of Aut(A). 

The invariance under Aut is often useful in checking nondefinability of some 
subsets or relations. 
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1.6 Exercise. Multiplication is not definable in Re os := (R,+), the 
additive group of reals. 

The test of invariance works for RZ.ou) because the group Aut(Rgoup) is 
large; in fact, the structure is homogeneous in the sense that two n-tuples satisfy 
the same formulas (have the same type) if and only if there is an automorphism 
taking one to another, and also every possible type is realized in the given model 
of the theory. This is not the case in general. For example, for the structure 
Reéeia := (R,+,-,0,1) the automorphism group is trivial. We get much better 
understanding of definability in this structure by looking into a nonstandard 
saturated model of the corresponding theory (see 2.13, 3.11, and 4.6). 


1.7 Definability of a structure. The notion of a definable set can be extended 
to that of a definable structure. 

Let Lo and L be languages and for the sake of brevity assume that D1 
is a relational language. One says that the language L is interpreted in an 
Lo-structure A if for some n, 


there is given an Lo-formula Q(Z) with n free variables, 

there is given an Lo-formula F(Z, 7) with 2n free variables, 

for every m-ary predicate symbol p; in Lo there is given an Lo-formula 
P;(£1,...,£m) with mn free variables, 

such that E(A) is an equivalence relation on the set Q(A) and the P;(A) 
are relations on Q(A) preserved by the equivalence E(A). 


Under these assumptions one considers the domain Q(A)/E(A) and the inter- 
pretation of the symbols p; on the domain given by P,(A). 

One says that an Ly-structure M is definable (interpretable) in an 
Lo-structure A if the above Ly,-structure on the domain Q(A)/E(A) is 
isomorphic to M. 

It is clear from the definition that assuming that M is defined in A, every 
definable set in M can be rewritten as a definable quotient set in A and every 
L[,-sentence holding in M can be rewritten into an appropriate L[o-sentence 
holding in A. 


1.8 Example. Let F = (F,+,-,0,1) be a field and GL,,(F) a group on the 
domain GL,,(F) of nxn nondegenerate matrices in the language (*, e) of groups. 
The natural interpretation of GL,(F) is on the domain 


D:={X = (aij) € F":i,j =1,...,n, det X £0}, 


with the interpretation of e as the element of D with x; = 1, x;; = 0 for all 
i,j <n,i# Jj, and the operation X * Y = Z interpreted on D by the known 
polynomial equations. 


1.9 Definition. Given two L-structures A and B and an embedding 
h: A — B, we say that the the embedding is elementary if for any L-formula 
P(ax1,...,%m) and any aj,...,@, € A, 


(x) AF P(ai,...,an) iff BE P(h(ay),...,h(an)). 
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In this situation A is also said to be an elementary substructure of B and B an 
elementary extension of A, written A =< B or A =; B. 
We say that A is elementarily equivalent to B, written A = B, if for any 
L-sentence P, 
AF Piff BE P. 


It is also useful to consider partial h : A — B with domh = X C A and 
rangeh = Y C B. Provided (*) holds for any aj,...,@, € X and any L-formula 
P, such an h is said to be an elementary monomorphism A — B. 

Before proceeding further we want to make a note on the notion of 
deducibility used in model theory. It is semantic, in distinction to the syn- 
tactic one elaborated in Chapter II. In the first-order context these notions are 
equivalent due to the Gédel completeness theorem, but in general the semantic 
approach is more flexible and can be used when no formal system of rules of 
deduction is available. 

Let € be a set of L-sentences. We write AF € if for any SE €, AFP. 


1.10 Definition. An L-sentence S is said to be a logical consequence of a finite 
E, written € F S,if A F € implies A F S for every L-structure A. For € infinite, 
€& S means that there is a finite €° C € such that E°F S. 

S is called logically valid, written F S, if AF S' for every L-structure A. 


1.11 Definition. € is said to be finitely satisfiable (f.s.) if any finite subset of 
E is satisfiable, that is, has a model. 

E is said to be deductively closed if for any L-sentence €, € F S' implies 
SEE: 

Clearly, a complete satisfiable € is deductively closed. In model-theoretic 
constructions one often moves between variations of a given language. 


1.12 Definition. Let A = (A,Z) be an L structure and L’ a language whose 
nonlogical symbols of that are in L, that is, L’C L. The structure A’ = (A, L’) 
on the domain A with the symbols of L’ interpreted as in A is called the 
L'-reduct of A. Conversely, A is an expansion of A’ to the language L. 

Obviously, under the notation above for an L’/-formula P(v1,...,Un) and 
Q1,---,4n EA, 


A’ EF P(aj,...,@n) iff AF P(a1,...,@n). 


1.13 A special and broadly used form of expansion of a structure A = (A, L) 
is the expansion by constant symbols naming elements in A. For C C A let 
La = LU {ca : a € C} be the extension of the language by the constant 
symbols and Ac the natural expansion of A to Lc assigning to cq the element a. 
Lo-formulas are then called formulas with parameters in C. 


2 The Compactness Theorem 


This section discusses the compactness theorem and its various immediate 
applications. This theorem was implicit in Gédel’s completeness theorem and 
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was proved independently by A. Mal’tsev in 1936. Its later proofs based on 
Henkin’s method produce more specialized models with more refined applica- 
tions. 


2.1 Compactness theorem. Let € be a finitely satisfiable set of L-sentences. 
Then E€ is satisfiable; moreover, E has a model of cardinality less than or equal 
to |L| + No. 

Below we discuss three proofs of the theorem. Note that each of them uses 
the axiom of choice; that is, the construction of the model is in general ineffec- 
tive. 

The first proof is an application of Gddel’s completeness theorem II.6.2 and 
uses the deduction system of II.2.2— II.2.5. 


2.2 Lemma. € is consistent. 


PRooF. Suppose € + P. Then €° + P for some finite €° C€, since only finitely 
many formulas are involved in the proof. In particular, if € were inconsistent, 
already its finite subset €° would be. But then €° could not be satisfiable, 
contradicting the assumption. 


2.3 Lemma (Lindenbaum’s theorem). € can be completed; that is, there is 
a complete f.s. set of L-sentences €# such that E CE*. 
ProoF. (Uses the axiom of choice). Let 


S={E': ECE’ an f.s. set of L-sentences}. 


Clearly S satisfies the hypothesis of Zorn’s lemma, so it contains a maximal 
element, €# say. This is complete, for otherwise, say S ¢ E€# and AS ¢ E*. 
By maximality neither {S}U€*# nor {=S} U &*% is f.s. Hence there exist finite 
€, CE*# and €) C E* such that neither {S}U €; nor {=S} U € is satisfiable. 
However, €, U €2 C &*, finite, so has a model, A say. But either A F S, so 
AF {S}U&, or AF 7S, so AF {7S} U Eg, a contradiction. 

Clearly, €% of the lemma is Gédelian so has a model by II.6.2. This model 
is also a model of €. This finishes the first proof 


2.4 Exercise. Let a, a ,...,Qn,3,1,---,8n,7 be closed L-terms,p, f L-symbols 


for n-ary predicate and n-ary operation, correspondingly, and P(v9,v1,---;Un) 
an L-formula with free variables vo, v1,...,Un- Prove that 

(a) a=GFB=a; 

(bl) a= 6, B=yFa=7; 

(c) Fa=a; 

(d 


(e a= 8,01 = (1,...,An = Bn, f(aa,.--,AQn) =aE F Biscsvss@: 


(f) P(G,a1,...,Qn) F AvoP(v0,a1,..-,Qn)- 


) 
) 
ise ames ghar ane E P(G1,..-; Bn); 
) 


A set € of L-sentences is said to be with witnesses if for any sentence in € 
of the form JvP(v) there is a closed L-term \ such that P(A) € €. 


2.5 Exercise. There exists a closed L-term if there exists a set of L-sentences 
that is complete, with witnesses, and f.s. (Consider the L-sentence Juv = v.) 
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2.6 Lemma. For some extension LDL of the language, with |Z] = |Z| + No, 
there is an extension E D E of E to a complete f.s. set of L-sentences with 
witnesses. 


PROOF. We are going to obtain L just by adding |L|+No new constant symbols. 
We introduce new languages L; and a complete set of L;-sentences €; (¢ = 
0,1,...). Let Lo = L. By Lindenbaum’s theorem there exists Ey DE, a complete 
set of Lo-sentences. 

Given an f.s. €; in language L,;, introduce the new language 


Digan = L;U{eg: Q a one-variable L;-formula} 
and the new set of D;,1 sentences 


€; =€ U{(davQ(v) — Q(cg)): Q a one-variable L;-formula}. 


Claim. €; is finitely satisfiable. Indeed, given a finite €’ CE, let E” = E'NE; 
and take a model A of €” with a domain A, which we assume well-ordered. 
Assign constants to symbols cg as follows: 


__ J the first element in Q(A), if Q(A) 4 2, 
@ ~ | the first element in A, otherwise. 


Denote the expanded structure by A*. By definition, for all Q(v), A* F 
dvQ(v) — Q(cg)). So A* F €’. This proves the claim. 

Let €;41 be a complete f.s. set of L;,+1-sentences containing €;. 

Take €* = Usen &.- This is finitely satisfiable. By construction one sees im- 
mediately that €* is with witnesses and is complete in the language U,-n Li = 
L + {new constants}. 


— 
u 


An L-structure A is called named if for every a € A there is a closed L-term 
A such that A“ = a. 


2.7 Proposition. For any complete f.s. set E of L-sentences with witnesses 
there is a named model. 


ProoF. Let A be the set of closed terms of L. This is nonempty by 2.5. For 
a,?€Adefinean Pifa=BEE. 

This is an equivalence relation by 2.4.1—2.4.3. 

For a € A, let &@ denote the ~-equivalence class containing a. Let 


A={a: ae€ A}. 


This will be the domain of our model A. We want to define relations, operations, 
and constants of L on A. 
Let p be an n-ary relation symbol of L and ay,...,@, € A. Define 


AF p(a1,.-.,@n) if p(ar,..-,Q@n) € E. 


By 2.4.4 the definition does not depend on the choice of representatives in the 
-classes. 


2 The Compactness Theorem 337 


For a unary operation symbol f of L of arity m and ay1,...,Q@m © A, set 


AF f(@1,...,@m) =7, where 7 = f(ay,...,Qm). 


By 2.4.5 the operation f in A is well-defined. 

Finally, for a constant symbol c¢, c“ is just é. 

We now prove by induction on the complexity of an L-formula Q(v1,..., Un) 
that 


(*) AF Q(d1,...,@n) iff Q(a1,...,Qn) € E. 

For atomic formulas we have this by definition. 

If Q = (Q1 /\ Q2) then A F (Q1(Q1,. os , Qn) /\ Q2(a1,. os ,On)) iff AF 
Qi(@i,---,@,) and A F Qo(d1,...,@n) iff (by induction hypothesis) Q, 
(Q1,---, Qn), Q2(Q1,.--,An) € E iff (by deductive closedness) (Qi (a1,..-,Qn) A 
Qo(ai,---,Qn)) € €, which proves (*) in this case. 

The case Q = —P is proved similarly. 

In case Q = JuP, A F JvP(v,a1,...,Qn) iff there is @ € A such that 
A — P(G,a1,...,@n) iff there is 8 € A such that P(G,a1,...,an) € €. The 
latter implies, by 2.4.6 and deductive closedness, that JuP(v,a1,...,Qn) € E, 
and the converse holds because € is with witnesses. This proves (*) for the 
formula and finishes the proof of (*) for all formulas. 


2.8 The second proof of the compactness theorem. 

By 2.6, EC E, for some complete f.s. set € of L-sentences with witnesses, 
|Z| = |L| + Xo. By 2.7 this has a named model, say A. By definition, |A| = 
|L| + Xo, and clearly the reduct of A to the language L is a model of E. 


2.9 The third proof of the compactness theorem uses ultraproducts of 
models. 
Let B be a Boolean algebra. A filter in B is a subset UCB such that 


(i) O€U; 
(ii) XEU, XCYEBSYeu; 
(iii) X, YEUSXNYEU. 
A filter U is called an ultrafilter if also 
(iv) for all Y € B, either Y EU or I\ Y €U. 


A filter U on B is said to be principal if there is Xo € B such that Xo CX 
for all X € U. Otherwise, we say that U is nonprincipal. 

In this section we deal with the case that B is the Boolean algebra of all 
subsets of a given set J. Then U is said to be a filter on I. 

Now let A; = (A;, ZL), i € I, be a set of L-structures and U a filter on I. We 
are going to construct a new structure, denoted by [];-, Ai/U, using the data. 

Let [],<,; Ai stand for the Cartesian product of the sets, that is, the set of 
all functions y : J > U,-; Ai with y(t) € A;. Define an equivalence relation 
(check it) on the set [],<) Ai: 


preuwv if {iEel: y(i)=va}eU 


(we say “y is equal to w almost everywhere modulo U”). 
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Now we denote by [],-,Ai/U the quotient of [J,-; Ai by the 
equivalence ~y. This is going to be the domain of the structure under con- 
struction; an element of it represented by y € [],<, Ai will be denoted by ¢. 

We interpret a symbol p of an n-ary relation on [],-; Ai/U by assum- 
ing p(Gi,---,Qn) true if {¢: A; F p(yi(t),..-,yn(t))} € U, that is, A; F 
p(yilt),---,Pn(t)) for almost all i. It is easy to check that this is well defined. 

The same principle is used to interpret the meaning of f(@1,..-,n) = Gn41 
for a symbol f of an n-ary operation, and similarly interpretation of c = ~ 
for a symbol of constant c. This defines the L-structure [],-; Ai/U, a filtered 
product of L-structures along U. When U is an ultrafilter, [],-; Ai/U is called 
an ultraproduct. In case A; = A for all i € J, the ultraproduct is called an 
ultrapower, written A! /U. 


2.10 Los’s theorem. Let A; = (A;,L), i € I, be a set of L-structures, U an 
ultrafilter on I, and |J,-,Ai/U the ultraproduct along U. 
For every L-formula P(x,...,%n) with free variables x1,...,2n and every 


P15--+3Pn € [Tier Ai/U, 


[[AsV/UF PG@i,.--,Gn) iF {is Aik P(yild),.-.,¢n()} € U. 
wel 


ProoFr. Induction on the complexity of P. For P(a1,...,2,) of the form 
p(a1,---,@n), f(@1,---,Xn-1) = Ln, C = Ln, for symbols of predicate, oper- 
ation, or constant, the statement holds by definition. 

Assuming the statement of the theorem for formulas P, and P, of a given 
complexity, one gets it for the formula P; & Pz by the property (iii) of a filter. 

For a formula of the form Sr, P,, if [],<7 Ai/U F J¢nsiP(G1,.--, Gn; 2n41); 
then by definition, there exists ¢,+41 in the structure such that [],-;Ai/U F 
Pi(P1,---; Gn; Pn41). By induction, A; F Pi(yi(2),.--,¢n(t), Pn41(4)) for 
almost all ¢ € I modulo U. This implies A; F 4an41Pi(yi(4),.--,9n(4), Un41) 
for almost alli € I. In the reverse direction, the latter implies the existence of a 
function y,41 such that A; F Pi(yi(2),.--,n(4), Gn41(2)) for the same values 
of 7 € I. This proves the inductive step in the case in question. 

Since every formula up to logical equivalence can be written in terms of &, 
4d, and -, to complete the proof of the theorem it suffices to check the statement 
for a formula of the form —P,. This case is immediate by property (iv) defining 
an ultrafilter. 


End of the proof of the compactness theorem. Third version. With- 
out loss of generality we assume that € is deductively closed, in particular, if 
S1,...,5n € E then (S; &---& S,) € E. 

By the assumptions, for every sentence S € € there exists a model Ag. Now 
we introduce an ultrafilter on €. For every S € E set Xs ={QEE: QE S}. 
Clearly Xs,e5, = Xs, Xs,. It follows that the set 


Uo ={YCE: XgCY, for some S € E} 


is a filter. By Zorn’s lemma, Up is contained in a maximal filter U, equivalently, 
an ultrafilter. 
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Now by Los’s theorem the ultraproduct [| s¢¢ As/U is a model of any S € €, 
so a model for €. 


2.11 Topological interpretation. Consider the set S of all Z-structures of 
bounded cardinality; card L+No will do. Consider the quotient S = S/ =, where 
= stands for elementary equivalence between L-structures. Every L-sentence P 
singles out a subset 


[P]={AeS: AFP} 


of S and a corresponding subset of S. Consider the topology on S with an open 
basis given by sets of the form [P]. The statement of the compactness theorem 
can be reformulated as follows: 

The topological space S of L-structures is compact. 

Now let [CS be a set of points in the space and U an ultrafilter on J. In 
a compact Hausdorff space there exists a unique limit point along the given 
ultrafilter, limy J. This point is provided by the ultrapoduct construction and 
Los’s theorem. Namely, limy J is given by the equivalence class represented by 
[lier Ai/U, with A; € S representing corresponding points 7 € I. 


2.12 Ultrapowers. Once ultraproducts were discovered it was noticed that 
ultrapowers A//U by a nonprincipal ultrafilter provide a special kind of model 
of a given complete theory. Note that by Los’s theorem, 


A= A‘/U. 


And this elementary extension of A has a remarkable property: every sequence 
{a;: i € I} of elements of A has a “limit” a in A’/U along the ultrafilter. Just 
take a to be ¢ for pitt aj. 

The limit in question can be defined properly in a topology on A similar 
to that of 2.11. Consider the topology Tpeg on A whose basic closed subsets 
that are the definable subsets of A (in later sections we will add to these the 
subsets definable with parameters). Our a is a limit point of the sequence in this 
topology. 

Much more can be said about an ultrapower, but in general, its properties 
depend essentially on the choice of the ultrafilter and on set-theoretic assump- 
tions. The simplest case is one of a nonprincipal ultrafilter on a countable set I 
assuming also CH. We also assume the language L and the structure A to be 
countable. Under these assumptions every countable sequence in A’/U has a 
Tpef-limit point in A//U. This important property is called saturation and will 
be discussed in detail later. Here we only quote one of the remarkable corollaries 
of saturation of ultrapowers. 


The Kiesler—Shelah theorem. For L-structures A and B, 


A=B. iff for some I and an ultrafilter U on I, A'/U = B!/U. 


H. Keisler proved this theorem in 1961 assuming CH. In fact, under CH, 
for countable L, A and B, one can restrict I to be a countable set and U any 
nonprincipal ultrafilter on I. 
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Later, in the 1970s, S. Shelah produced a clever combinatorial proof 
avoiding CH. 

Finally, we remark here that ultrapowers and, more generally, ultraproducts 
found many applications (e.g., a construction of Gromov’s asymptotic cones by 
van den Dries and Wilkie), but nowadays the preference in most cases is given 
to an equivalent treatment, via saturated models. 


2.13 Nonstandard models of classical theories. A very simple application 
of the compactness theorem establishes the existence of nonstandard models of 
such theories as arithmetic, real analysis, and others. 

Let N = (N,+,-,0,1) be the usual structure on nonnegative integers in the 
language of arithmetic, L; Ar (which is also used as a language for fields). The 
theory Th(N) is called complete arithmetic (to be distinguished from Peano 
arithmetic, given by a system of axioms that is incomplete). 

Any model of Th(N) distinct from (not isomorphic to) N is called a 
nonstandard arithmetic. The existence of one such is immediate by the com- 
pactness theorem once one considers the set of L;Ar(c)-sentences 


E=Th(N)U{ac=n: n=0,1,...}, 


where L(c) stands for the extension of the language L by a constant symbol (or 
a set of constant symbols). 

Clearly, € is finitely satisfiable and any of its models, reduced to the language 
LAr, is nonstandard. One easily sees (prove it) that necessarily c > i for every 
n (for the given theory 2; < x2 replaces Jy x1 + y = x2); that is, nonstandard 
elements of arithmetic are “infinite integers.” 

One can be more creative in constructing nonstandard integers in non- 
standard models by choosing a more interesting € and ending up with, say, a 
nonstandard integer that is divisible by any standard n. 

It is useful to see how a nonstandard model can be obtained using ultra- 
products. Let U be a nonprincipal ultrafilter on an infinite set J, and 


*N =N’/U, 


the ultrapower of N, that is, by Los’s theorem, a model of the complete arith- 
metic. Let gp: [++ N be a function that is not constant on any X € U. Clearly, 
~ is a nonstandard integer. In particular, for J = N and y: n+ nl!, the 
nonstandard integer ¢ is divisible by any standard one. 

Let us introduce now a first-order formalism for real analysis, which is 
weaker than L2Real of III.2 but powerful enough to express many interesting 
problems. The language L; Real consists of symbols of operations, one for each 
n-ary function f : R” — R. Observe that this is enough to express the relation 
(21,-+-,2n) € S' for any given subset SCR”; just use the characteristic func- 
tion of S. In particular, any real number is named by a symbol of operation. 
We reserve the standard notation for symbols of operations +,-,—, / as well as 
for standard relations on R. 

Let Ranalysis be the obvious L;Real-structure on R. This we assume to be 
the standard model of real analysis. Correspondingly, any model of Th(Ranalysis) 
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other than the standard one is nonstandard, we say in a short form a nonstan- 
dard model of the reals. 

We claim that any nonstandard model *R of the reals contains an element 
a@ such that 


1 
= 1 
aS ee (1) 


for every positive integer n. Indeed, there must be a new, unnamed element, 
say y, in *R. Let 


bl ={aEeQ:a<y}, bt ={¢eQ:a>7} 


be the corresponding Dedekind cut, where we allow one of the sets to be empty. 
If, say [y]~ = @, set a := —y~!, which satisfies (1). Similarly in the other case. 
So, we may assume that both parts of the cut are nonempty. Let r be the unique 
(standard) real number defined by the cut. 

We have either r < y or r > 7. Assuming the first, set a := y—r. This 
satisfies (1). In the second case set a := r — y. This proves the claim. 

Call a satisfying (1) a positive infinitesimal. An infinitesimal is a nonstan- 
dard real that is equal to a or —a for a positive infinitesimal a. 

Call a nonstandard + infinite if [y|* or [y|~ is empty. Otherwise ¥ is said to 
be bounded. 

It can now easily be checked that the subset B C*R of bounded elements 
forms a ring, and its subset 4 C B of infinitesimals is its maximal ideal. The 
rule st:r+avrr, for r € R, a € p, determines a well-defined surjective 
homomorphism of rings B > R, called the standard part map. Obviously, when 
identified with the (partial) map *R — R, this is exactly the residue map 
corresponding to the unique valuation on *R with the valuation ring B. 

Now let f:R — R be a function. By assumption, our language contains a 
symbol of operation f interpreted as f. Let *f : *R — *R be the function in 
the nonstandard model corresponding to f. Similarly for notations of subsets. 

The following is easy to check: 

f is continuous in the interval (r1,1r2) iff *f(«+a)—* f(x) is infinitesimal, 
for any « € (11,72) and any infinitesimal a. 

g is a derivative of f on (r1,r2) iff g(x) = st(* f(a +a) —*f(ax)/a) for any 
standard real x € (11,172) and an infinitesimal a. 
and so on. 

One can also extend the definitions of nonstandard analysis to analysis in 
Hilbert and Banach spaces, to measure theory, and indeed to any part of math- 
ematics that deals with limits. 

Nonstandard analysis provides a solid foundation to Leibniz’s idea of 
infinitesimal calculus. It allows a convenient graphical formalism for operat- 
ing with limits and infinities and as such leads to a number of beautiful proofs, 
sometimes new. Yet in its general form the method has obvious limits; after 
all, it is just a reformulation of analysis in metamathematical terms based on 
the compactness theorem. A much deeper mathematics based on understanding 
definability has been developed in concrete cases for tame theories, such as the 
theory of the field of reals (R,+,-,0,1) or (R,+,-,0,1,exp), the field of reals 
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with exponentiation. The way forward is in classifying definable relations in a 
given structure and, eventually, understanding the structure of saturated mod- 
els of the corresponding theory. This method is called elimination of quantifiers; 
see Section 3.18. 


3 Basic Methods and Constructions 


3.1 Definition. We will call a set T’ of L-sentences an L-theory, or simply 
theory, if T is satisfiable and deductively closed. 

A subset € of T such that T is the set of all logical consequences of € is said 
to be a set of axioms of T. 


3.2 Method of diagrams. For an L-structure A let Ly = LU{ca: a € A} 
be the expansion of the language, Aw the natural expansion of A to La 
assigning to c, the element a. Define the diagram of A to be Diag(A) = 
{S: atomic or negation of atomic L4-sentence, s.t. A4 F S} and the complete 
diagram of A to be 


CDiag(A) = {S' : L4-sentence such that AF S}. 


Theorem (Method of Diagrams). For an L-structure B, 


(i) there is an expansion Bx to the language La such that Ba — Diag(A) iff 
ACB. 

(ii) there is an expansion By to the language La such that Ba | CDiag(A) 
iff A =< B. 


Proor. By definition, a — cB4 is an embedding iff B4 / Diag(A). 
The same holds for an elementary embedding and CDiag(A). 


Corollary. Given an L-structure A and an L-theory T, 


(i) the set T UDiag(A) is finitely satisfiable iff there is a model B of T such 
that ACB. 

(it) the set TUCDiag(A) is finitely satisfiable iff there is a model B of T such 
that A = B. 


3.3 Application. Local theorems of Mal’tsev. In the 1940s A. Mal’tsev 
proved a number of theorems dealing with embeddings of some algebraic struc- 
tures into others using the compactness theorem, or, more specifically, the 
method of diagrams. He called this type of theorem local in the sense that 
it used the fact that if a certain property holds for finitely generated subalge- 
bras (holds locally) then it holds for the algebra itself. We present an example 
of such a theorem. 

Recall that a group G is said to be linear of rank n if it is isomorphic to a 
subgroup of GL,,(F) for some field F. 
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Theorem (A. Mal’tsev). G is linear of rank n if every finitely generated 
subgroup of G is. 


PROOF. We use the notion of definability of structures explained in 1.7 and 
the example 1.8. Observe that the interpretation of GL,(F) in a field F is 
independent of F. 

Let G = (G,x,e) be a locally linear group of rank n, that is, with the 
property that every finitely generated subgroup of it is embeddable into GL,,(F) 
for some field F. 

Consider the theory Tr stating the axioms of fields in the language (+, -, 0,1). 
Consider the diagram Diag(G) of the group. 

Let D((2x;;)) be the formula in n? variables in the language of fields defining 
the set 


{(ai;) € F" i,j =1,...,n, det(aj;) 4 Of. 


Now we want to rewrite the diagram of G by a diagram Diag”(G) in the 
language of fields extended by contant symbols. For each constant symbol c% 
naming an element g of G we introduce n? contant symbols cf, i,j € {1,..-,n}, 
and include in Diag’ (G) the formula D((cj;)) for every g € G. For each sub- 
formula in the diagram of the form c% * c” = c%", include in Diag’ (G) the 


formula 


g Rh _ gh 
Cin Ckg = Miz 
k 


Consider the set of sentences 
T = Tr U Diag" (G). 


The assumption that every finitely generated subgroup of G is isomorphic to 
a subgroup of a GL,(F) guarantees that T is finitely satisfiable. By 3.2 the 
theorem follows. 


3.4 Lowenheim—Skolem theorem. Suppose T is an L-theory having an 
infinite model A. Then for every & > cardL + No there is a model B of T 
of cardinality equal to k. 


PrRooF. In case card A < & we will construct B such that A = B. This is called 
the upward Lowenheim—Skolem theorem. 

Consider the extension of the language L.4 by the new constant symbols cq, 
a < «, and consider the set of sentences 


CDiag(A) U {acg = cg: a< 6 < kK}. 


This is finitely satisfiable because A is infinite. So it has a model B > A of 
cardinality not bigger than that of the language, that is, < «. But each cq is 
interpreted by a different element of B, so card B = xk. 

In case card A > k& one proves the downward Léwenheim-—Skolem theorem, 
which provides a B of cardinality & as an elementary substructure of A. 
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Start with a nonempty subset Bo C A of cardinality «. Fix some ag € Bo. 
For each L-formula P(v1,...,Un) define a function gp : A"~! > A by 


an element a€ A: AF P(ay,...,@n—1, 0) 
gp(a@1,---,;@n—-1) = if such exists, 


ao if not 


(gp are called Skolem functions). 

Let B be the closure of Bo under all the gp. This is closed under all the 
L-operations f, since any such (n—1)-ary f coinsides with the Skolem function 
If (v1...vn—-1)=vn- Let B be the structure on B induced from A. It is easy to prove, 
by induction on the complexity of formulas, that for any Z-formula Q(v1,..., Un) 
and any b),...,b, € B, 


B F Q(b1,---5bn) AF Q(b1,..., bn), 


that is, B < A, of cardinality «, as required. 


3.5 Elementary chains of models. Let, for an ordinal «, 


Ap CA C--:-CAQC::- (a <k) (2) 


be a «-sequence of L-structures forming a chain with respect to embeddings, 
with As for limit ordinals 6 < « defined as follows: 


the domain As = U,<5 Aa, 

predicate p“* = Lc p<, for each predicate symbol p of L, 

operation f4° : A — As maps @ to b iff @ is in A, for some a and 
f(a) = b, for each operation symbol f of L, 

and c4s = c4°, for each constant symbol from L. 


The chain (2) is said to be elementary if for each a, 
Aw x Ag+: 


3.6 Lemma. For an elementary chain (2), Aa < Ag for anya<d<k. 


PROOF. Clearly, it suffices to prove the statement for all limit ordinals 6 < k. 
By induction we may assume that A, < Ag, for alla < 6 <6. 
Now, in order to prove Ag =< As, we prove 


Ag F Q(@) & As F Q(@) («x) 


for all L-formulas Q(%) and G@ in A, by induction on the complexity of Q. 

We may assume that Q is constructed from atomic formulas using &, 7, and 
J only. 

For Q atomic, («*) follows from the fact that Ag CAs, an embedding. The 
cases of Q = Qi A Q2 and Q = 7=Q, are easy. In the case Q(Z) = dy P(Z, y) 
the = side of («*) follows immediately from the induction hypothesis and the 
meaning of J. 
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Proof of =: As F dy P(G@,y) implies As F P(G@,b), for some b € Az, so 
b € Ag for some a < 6 < 0. By the induction hypothesis Ag F P(a@,b). The 
latter implies Ag F Sy P(a@,y) and so Ag F dy P(@, y), since Ag =< Ag. 


3.7 Types. We fix a complete L-theory T. A set 7 of Z-formulas P(Z) with 
n free variables £ = (a1,...,%) is called an n-type (in T) if for any 
Pil B)j0505 Fl #) eT, 


TRaz A Pa). 
i<k 
Type T is called complete if also for any P(Z) either P(Z) € tT or =P(Z) € Tr. 
A type 7 is called principal if there is P(Z) such that T 
any Q(z) eT, TE Va(P(Z) > Q(2)). 
P is called then a principal formula for type T. 


A type that is not principal is called nonprincipal. 
Example. The set of formulas {0 < x < 4 : O <n € N} is a 1-type in 
the theory of reals Th(Reéeia). (Here, 0 < a < a stands forO< a & a-a <1, 
where x < y is written for dz(z40& y=2+2?).) 


Suppose a € A”. Then we define the L-type of a in A, 


tpa(@) ={P(z): AF P(a)}. 


Clearly, tp, (@) is a complete n-type. 


Remarks. 


(i) When A C B then tp,(a) and tpg(a) may be different. But it follows 
immediately from the definitions that 


A = B implies tp, (a) = tpp(a). 
(ii) If 7 : A — B is an isomorphism, @ € A”, b € B”, and 7: a — 6b, then 
tpa(@) = tpp(d). 


We say that an n-type p is realized in A if there is a@ € A” such that 
pCtpa (a). 
If there is no such @ in A we say that p is omitted in A. 


3.8 Exercise. A principal type p is realized in any model A of T. 


3.9 Lemma. Given a set T = {7% : a < &} of n-types, an L-structure A, and 
a cardinal & > max{|A|,|L|}, there is a B = A of cardinality « such that all 
types from T are realized in B. 


ProoF. In view of 3.6 it suffices to prove the statement for T consisting of 
just one n-type T. Consider the expansion L 4, of L.4 by new constants c1,...,Cn 
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and the theory 
Tac = CDiag(A) U{P(c1,..-,¢n): P(ai,...,%n) € TH. 


It is immediate from the definition of type that T'4. is finitely satisfiable. 

By the compactness theorem there is a model Bac - Te of cardinality 
at most card A + No. Since By. - CDiag(A), the L-reduct B of By- is an 
elementary extension of A. 


3.10 Example. Any proper elementary extension of the standard model R of 
the reals (in the language containing + and -) realizes the infinitesimal type, by 
the argument in 2.13. This remarkable property is equivalent to the statement 
that R is complete in the standard metric. 


3.11 Saturation. Given an infinite cardinal «, a structure A is called 
«- saturated if for any cardinal \ < « and for any expansion Ac of A by constant 
symbols C = {c; : i < A}, every 1-type in Th(Ac) is realized in Ac. 

We say just saturated instead of «-saturated when « = card A. 


Remark. A finite structure A is «-saturated for every k. 
Theorem. Let T be a complete theory. 


(i) For every & > cardT there exists a «-saturated model of T of cardinality 
<a«t, 
(ii) Any two saturated models of T of the same cardinality are isomorphic. 


PROOF. (i) We use here a standard construction. 
We assume that T has infinite models. Let A be a model of T' of cardinality 
«. By 3.9 there is an elementary extension A’ > A such that any 1-type in 
Th(A) over any CCA with cardC < & is realized in A’. 
Denote A by A) and then construct, using 3.9 repeatedly, an elementary 
chain of models 
AM) 2 AO 2 ai AO, 


of length y, for wy > « a regular cardinal (42 = «* will always do) such that 
A(?+1) realizes all 1-types over subsets of A‘ of cardinality less than «. Then 
the union A* = U,<,+ A‘) of the elementary chain, by Lemma 3.6, is an 
elementary extension of A, and indeed of each A‘®. By construction, for any 
subset C of the domain A®* of cardinality < « one can find A < yw such that 
CCW .25 Al) C AQ), It follows that A* is a K-saturated model of T. This 
proves (i). 

(ii) We use the above method in combination with the back-and-forth method. 

Let 
A={aj:0<i<k}, B={b:0<i<k} 


be the domains of saturated models A and B of cardinality «, with ordinal 
orderings. We construct by induction on a < « the subsets Ag C Aand Bg Cc B 
with orderings 

Aveat@ if<el, By =ih +3 ot 
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satisfying the conditions 
tp(a?,--- ,a7") = tp(b”,..., 0’) (3) 
for any finite sequences 0 < 71 <+++ < jm <Q; 


ifd +2n <a, 6 limit,n €w,then asin € Aa (4) 
ifd6 +2n+1 <a, 6 limit,n €w,then bs, € Bo (5) 


Clearly, (3) implies that a? +> b/ is an elementary monomorphism Ay — By. 
When we reach a = &, this together with (4) and (5) will give us an isomorphism 
AB. 

For a = 1, take a? := ap and choose b° to be the first element among the b; 
satisfying the type tp(a°). 

Now assume that A, and By, have been constructed. We introduce constant 
symbols c/ naming the a’ in A and b/ in B. Set Cy = {c) : j < a}. 

If a is of the form 6 + 2n and agin ¢ Aa, we choose a® := a5+n. If already 
asin € Aa, we skip the step. Then we choose b® to be the first element among 
the 6; satisfying the type tp(a*/C,). Such a b; does exist since cardCy < Kk 
and B is «-saturated. 

If a is of the form 6+ 2n+4+ 1 and bs4n ¢ Ba, we choose b® := bs4n. Then we 
choose a® to be the first element among the a; satisfying the type tp(b*/C4). 

In each case, (3)—(5) are satisfied for a + 1. 

On limit steps A of the construction we take 


Ay = (J 4c, Ba = (J Ba. 


a<r a<r 
This has the desired properties. 


3.12 In case k is regular, e.g., « = 2* = A+ for some cardinal \, the construction 
in the proof of 3.11(i) produces a «-saturated model of cardinality «. In par- 
ticular, assuming GCH, saturated models exist, and assuming CH, there exist 
saturated models of countable theories of cardinality the continuum or less. 


3.13 The back-and-forth method used in the proof of (ii) above is a universal 
tool in model theory, apparently first used by G. Cantor in his construction of 
the isomorphism between countable dense orders. In fact, Cantor’s theorem is 
a special case of 3.11(ii), since a dense linear order is No-saturated. 

It follows from 3.11 that if 7; and T> are complete theories in the same 
language having saturated models A, and Ag, respectively, of the same car- 
dinality, then 7; = T> iff Ay = Ag. This is a powerful criterion of elemen- 
tary equivalence in case the existence of saturated models can be established 
(see also 2.12). In general a saturated model may not exist without assum- 
ing some form of generalized continuum hypothesis, but there are ways, using 
set-theoretic analysis, around this problem. 

In fact, there is a way, less algebraic but more universal, to apply a back- 
and-forth procedure to establish completeness of theories. 
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3.14 A back-and-forth system between L-structures A and B is a nonempty set 
TI of isomorphisms of substructures of A and substructures of B such that 

a € Dom fp and a’ € Range fo, for some fo € J, and 

(forth) for every f € I and a € A there is a g € I such that f Cg and 
a € Domg; 

(back) For every f € I and b € B there is a g € I such that f Cg and 
be Range g. 

It is easy to adjust the proof above to prove the following. 


3.15 Theorem (Ehrenfeucht—Fraisse criterion for saturated models). 
Given No-saturated L-structures A and B, A =B if and only if there exists a 
back-and-forth system between the two structures. 

In view of this theorem and similar facts, in model theory one often operates 
under the principle that there is no harm in assuming GCH. 

Saturated structures play an important role in model theory. The reader 
familiar with algebraic geometry could compare it with the role played by a 
universal domain in the sense of A. Weil, that is, a field of infinite transcendence 
degree. In fact, it is convenient in a concrete context of a given complete theory 
T to fix a «-saturated model M with a « “large enough” (to all intents and 
purposes). Such a model is often called the universal domain for T. In model- 
theoretic slang one more often refers to M as the monster model. 


3.16 Homogeneity. One says that a structure A is homogeneous if for any 
subset X of A of cardinality strictly less than card A an elementary monomor- 
phism h : A — A with domain X can be extended to an automorphism 
of A. 

A standard application of the back-and-forth method furnishes the following 
fact: A saturated structure is homogeneous. 


3.17 Omitting types. Despite the importance of saturated models, the ability 
to construct a model in which certain types are omitted is key in the analysis of 
the variety of models and technically much more difficult (the model theorists’ 
folklore of 1960s put it: any fool can realize a type but it takes a model-theorist 
to omit one). For example, there is a model of the theory Th(R) of the field of 
reals that omits types of all transcendental reals. This follows from results in 
4.6 below. Using Henkin’s construction of models, R. Vaught proved that if T is 
a theory in a countable language then any countable collection of nonprincipal 
types can be omitted in some countable model of T. 
We would also like to mention the following important result. 


Theorem (Ehrenfeucht—Mostowski). Let T be a complete theory of a count- 
able language and assume that T has infinite models. Given an infinite cardinal 
A, there is a model A of cardinality X that realizes at most Xo complete n-types 
for every n € N. Moreover, every two n-tuples satisfying the same complete 
type are conjugated by an automorphism of A. 

To prove the theorem one uses the known Ramsay theorem of infinite com- 
binatorics in combination with more traditional methods. We skip the proof, 
which can be found elsewhere. 
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3.18 Quantifier elimination. The criterion of elementary equivalence above 
can be adopted for classifying elementary equivalent n-tuples in a given struc- 
ture and, moreover, classifying definable subsets of a given structure. 


Proposition. Given a saturated L-structure A and two n-tuples @ and b in A, 
tp(@) =tp(b) iff there is m € Aut(A) s.t. 7: a+ b. 

PROOF. We need to prove only the left-to-right implication. Extend the lan- 

guage by n constant symbols to name @, in the first case, and to name b, in 

the second one. We obtain two expansions A, and A, of A to the extended 


language, both still saturated. The proposition follows by 3.11. 


Define the quantifier-free type of a tuple @ in A, 


qftp 4(@) := {Q(Z) : quantifier-free, A F Q(a)}. 


Theorem. Given a saturated model A of a complete theory T, the following 
two conditions are equivalent: 


(i) for any two n-tuples @ and b in A, 
qftp(@) = qftp(6) iff there ism € Aut(A) s.t. 7:44 6; 


(it) any L-formula with n free variables is equivalent to a quantifier-free L- 
formula. 


Proor. Assuming (ii), any n-type is equivalent to a quantifier-free one. So, 
(ii) > (i) by the proposition. 
We prove the converse. Let Q(%) be an L-formula with free variables Z, 


Tg = {P(£): quantifier-free, A F Vz(Q(z) — P(z))}. 


Claim. Tg U {=Q} is inconsistent. 

Indeed, otherwise in the saturated A there is a realization b of the type 
Tq together with Q. Then qftp(b) will be consistent with Q, for otherwise 
=R(z) is in Tg for some R € qftp(b). Then there exists é realizing qftp(b)&Q, 
a contradiction. 

It follows from the claim, by the compactness theorem, that for some $(Z), 
a conjunction of finitely many formulas of 7g, A F Vz(S(Z) - Q(Z)). But by 
definition, also A F V%(Q(Z) — S(Z)), so in A and in T, Q is equivalent to a 
quantifier-free formula S. 


3.19 Remark. The quantifier elimination criterion above may look somewhat 
restricted by the assumption of the existence of a saturated model. In fact, 
using 3.15 one can drop the restriction at the cost of having a more complex 
condition in (i). 
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4 Completeness and Quantifier Elimination in Some 
Theories 


4.1 The theory of an algebraically closed field. ACF,, the theory of an 
algebraically closed field of characteristic p > 0, is given by the following axioms 
in the language LAr with the binary operations +, - and constant symbols 0 
and 1: 


I. Axioms of fields. 
II. The axioms of algebraic closedness: for each positive n € N, 


Vai, es 25 Una” +ya"1 +--+ + Yn =0. 


Ill. The axiom of characteristic p: 


The theory ACF» of algebraically closed fields of characteristic zero is given 
by axioms I, IT and negations of axioms III for all prime p: 


A14---4+1=0. 
——— 


Pp 


Remark. It is immediate by the axioms that the ultraproduct [| ,cprimes Kp/U 

of models kK, of ACF, along a nonprincipal ultrafilter is a model of ACF. 
Moreover, if an L;Ar-sentence P holds in all but finitely many Kp, p € 

Primes, then P holds on an algebraically closed field of characteristic zero. 


4.2 Theorem (Tarski). ACF, is complete and allows quantifier elimination. 


PROOF. In essence the theorem follows from the well-known Steinitz theorem: 
Given two algebraically closed fields A and B of the same characteristic p and 
their common subfield k, 


A =, B if and only if trd(A/k) = trd(B/k), 


were trd is the transcendence degree of the field over the subfield, the cardinality 
of a maximal algebraically independent subset of the field over the subfield. 

Consider two No-saturated models A and B of ACF, of the same uncount- 
able cardinality « and let @ be an n-tuple in A, b an n-tuple in B such that for 
every polynomial p(21,...,@n) over the prime field ko, 


p(a) = 0 iff p(e) = 0. 


Note that under the assumptions, the fields kg(@) and ko(b) are isomorphic 
by the unique isomorphism 7 sending @ to b. So, we may assume that ko(@) = 
ko(b) = k is a common subfield of A and B. 

Clearly trd(A/k) = « = trd(B/k), so by Steinitz there is an isomor- 


phism 7 : A — B such that 7(@) = b. This proves, by 3.15, that ACF, is 
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complete. When we consider A = B, the existence of the automorphism 7 
affirms elimination of quantifiers, by 3.18. 


Corollary (Strong Lefshetz principle). For an L;Ar-sentence P the follow- 
ing are equivalent: 


(i) CFP; 
(it) FEP, for any algebraically closed field F of characteristic 0; 


(iit) F, FP, for all but finitely many primes p. 


Here and below F, is the algebraic closure of the p-element field F,. 

The original Lefshetz principle is an informally established fact known to 
algebraic geometers: an algebrogeometric statement proven in the context of 
complex algebraic geometry holds also for any abstract algebraically closed 
field of characteristic zero. 


4.3 Constructible sets. The quantifier elimination statement in the language 
L,Ar can be translated into the following form, also known as Chevalley’s 
theorem: given an algebraically closed field F, the family of LAr-definable 
(using parameters) subsets of F”, for all n, coincides with the family of con- 
structible subsets. 

Here constructible means a set representable as a Boolean combination of 
zero-sets of polynomials. 

Note also that the family of L;Ar-definable sets is the same as the family 
of sets obtained from zero-sets of polynomials by applying Boolean operations 
(union, intersection, complement) and projections F"*! — F”. 


4.4 Definable functions. An easy analysis of constructible sets and 
constructible functions (those with constructible graphs) yields the following: 

Let F be an algebraically closed field of characteristic 0, V GC F” a con- 
structible subset, and f : V — F a constructible function defined everywhere 
on V. Then there is a constructible partition V = V; U---U Vz such that for 
each i € {1,...,k}, 


pi(v) 
qi(v) 
Pi,q@ polynomials over F, g; not vanishing on Vj. 

A corollary of the above is this: Let V CF”, W C F™ be constructible 
subsets and f : V — W a constructible map, Dom f = V, Range f = W, in 


an algebraically closed field F of characteristic 0. Then there are constructible 
partitions 


fiv,(®) = for all vE Vj, 


V=ViU+--U%, W=WiU---UW,, 


such that for each i € {1,...,k}, f(Vi) = Wi and fiy, coincides with a rational 
map (given by (gi1(U),.--,9im(¥)), each g;;(0) a rational function with a 
denominator not vanishing on V;). 


4.5 Application (J. Ax). Let V = V(C) be an abstract algebraic variety and 
f:V—V a regular injective map. Then f is surjective. 
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PROOF. First we note that the abstract algebraic variety V is definably equiva- 
lent to a constructible subset W CC” for some n. By this we mean that V, given 
as an atlas of charts V = U,<; Vi, with each V; in a bijective correspondence ¢; 
with an affine variety U;, glued together by regular maps ¢;;, i,7 € {1,...,k}, 
can instead be put in a bijective correspondence w : V — W with the definable 
set in such a way that the induced maps U; — W and the corresponding gluing 
maps are definable using parameters. (Of course, the Zariski topology in this 
representation is ignored.) 

As a result, we reformulate the data: 

f : W — W is an injective definable map on a definable subset W C C”, 
both using parameters a@1,...,@m € C. So, we write f(w) as F(a, w) and W as 
W., which are now written in terms of +,-,0, and 1. Importantly, F(a, w) is a 
piecewise rational map, by 4.4. 

The condition on @ expressing the fact that F(@,w) is an injective map 
W, — W, defined on the whole of W, can be written as an V-formula (check it). 
Call this formula Inj” (a). 

Now suppose toward a contradiction that f is not surjective. Then the 
sentence 


P: Azu (Inj* (2) & ue W, & Ve € W,F(z,2) #u) 


holds in C. Hence by the strong Lefshetz principle, for some prime p, Fp F P. 
So, for some b and c in Fy, 


F, F Inj" (b) & ce Wi & Va € W, F(b,2) #. 


The formula in question is clearly equivalent to an V-formula. Hence, by 1.4(ii) 
for any subfield k C F, containing b and c, 


ca 


E Inj" (bs) & cE W & Vn CW, F(b, 2) c. 


We can choose k to be a finite subfield and thus get a statement that F(b, x) 
defines an injective map W;(k) into itself that is not surjective. This contradicts 
the fact that W,(k) is finite. 

James Ax observed this, by then unknown, fact in his paper “The elementary 
theory of finite fields”, Ann. of Math. 88 (1968), 239-271. Later, G. Shimura 
gave a proof of Ax’s theorem by means of reduction mod p. A. Borel published 
a third proof based on cohomology with compact supports, Injective endomor- 
phisms of algebraic varieties, Archiv der Mathematik, 1968. 


4.6 The theory of real closed fields. A natural language for this theory is 
Lecr = {+,-,<,0,1}. The axioms of the theory RCF, the theory of real closed 
fields are: 


I. The axioms of ordered fields. 
Il. Va (0< a dy y?=2). 
III. For every odd n the axiom 


Vyty-++) Unde a” + ye) +--+ + yn =0. 
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We note that IT and ITI together are equivalent to the sign change statement: 
for every polynomial f(a) over the field, if for some a < b, f(a) + f(b) < 0, then 
there is c, a< c< 6b with f(c) =0. 

Among standard algebraic facts about real closed fields are the following. 


Lemma. Let A,B be real closed fields and Ag CA, Bo CB subfields such that 
Ao =y Bo. Then 


(i) the isomorphism y can be extended to an isomorphism w : Ay — Bo 
between the relative algebraic closures of the respective subfields in 
A and B. 

(ii) assuming that Ao and Bo are respectively algebraically closed in A and 
B, ap € A\ Ao, bo € B\ Bo such that for any a € A, ap < a if and only 
if bp < y(a), then y can be extended to an isomorphism of ordered fields 
w : Ao(ao) > Bo(bo). 

(iii) assuming that Ao is algebraically closed in A, a finite system of inequalities 
f(x) <0, for f(a) € Ao[z], has a solution in A if and only if it has a solution 
in Ao. 


4.7 Theorem (Tarski-Seidenberg). The theory RCF is complete and allows 
quantifier elimination. 


PROOF. We use the same method as in 4.2. Let A and B be two real closed 
«-saturated fields. 

Claim. Suppose Ao, Bo are respectively subfields of cardinality less than « of 
A and B, and suppose Ap =, Bo as ordered rings. Then for every a € A there 
are b € B and an extension w of the isomorphism ¢y such that Ag(a) =y Bo(d). 

For a algebraic over Ag the claim follows by (i) of the lemma in 4.6. 

For a transcendental we first consider the quantifier-free Lecr(Ao)-type 
Ta of a, that is, the set of formulas f(a) > 0 for polynomial f(a) over Ao, 
holding for x = a. We obtain a quantifier-free Lpcr(Bo)-type Tg by replacing 
parameters from Ao in every f(a) by corresponding parameters in Bo. Note that 
Tp is a type, that is, it is consistent in the theory of B, by (iii) of the lbmma 
in 4.6. Now we use the assumption of «-saturation and find an element b € B 
realizing the type 7g. Define w : Ao(a) — Bo(b) as the unique isomorphism of 
fields with w(a) = b. By construction w also preserves the order. Claim proved. 

The completeness of RCF is now immediate from the claim by 3.15. 

To establish quantifier elimination consider in a given K-saturated A two 
n-tuples @ and 6 satisfying the same quantifier-free Leop-formulas. It follows 
that the subfields Q(a@) and Q(b) are isomorphic as ordered fields by the 
isomorphism sending a to 6. Now the above claim allows a construction of a 
back-and-forth system between A,, that is, A with @ named by constants, 
and A,, with b named by the same constants. By 3.18 quantifier elimination 
follows. 


4.8 Semialgebraic sets and semialgebraic functions. Semialgebraic 
sets are the solution sets of equations p(x1,...,%,) = 0 and inequalities 
g(@1,---,2n) > 0, for p,q polynomials over R, and those obtained from 
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such solution sets by means of finite intersections and unions. Clearly, by 
definition these are exactly the quantifier-free sets definable in R. So the 
Tarski-Seidenberg theorem 4.7 in effect says that sets definable (using param- 
eters) in R are precisely the semialgebraic sets. In particular, the projection of 
a semialgebraic set is semialgebraic. 

Note that a solution set of a one-variable polynomial inequality f(a) > 0 is 
a finite union of open intervals (a,b), where —co < a < b < +o0. It follows that 
any definable subset of R is a finite union of open intervals and points. This 
property of an ordered structure is called o-minimality (order-minimality) and 
will be discussed in later sections. 

A semialgebraic function is a function with a semialgebraic graph. Again 
by the Tarski-Seidenberg theorem this is the same as definable in R using 
parameters. 

Suppose g(x) is a semialgebraic function. By the above, we may assume 
Dom g = (a,b). We may also assume that the graph g(x) = y is defined just 
by a conjunction of polymomial equations and inequalities over R. Clearly, 
at least one of these must be an equation p(z,y) = 0. Write p(a,y) as 
ao(x)y” + ai(x)y"~1 +---+4@n(x), for some n > 0 and a;(x) € R{z]. It follows 
that y is one of the roots of the polynomial ao(x)y” + ai(x)y"~1 + +++ + dn (2). 
On a subinterval y can coincide with the greatest root of p(x,y), second 
greatest one, and so on 

This proves the following. 


Fact. Given a definable (i.e., semialgebraic) function g : R — R, its domain 
can be divided into finitely many open intervals and points such that on each 
interval or point, g(x) is equal to the kth-greatest root of a polynomial p(z, y), 
for some k < deg, p. 


4.9 Application (L. H6rmander). Let p(x1,...,2n,) be a polynomial over R 
and f,(r) the function of the nonnegative real variable r defined as follows: 


fo(r) := min{p(4,...,2n) : |zi]+---+ |en| =r}. 


Assume that for any given positive real R there is rg such that for r > rp, 


fp(r) > R. 
Then there are a positive rational number a and a positive real c such that 


PrRooF. First note that f, is definable in the field of reals. So, by the fact 
in 4.8, f is defined piecewise, on finitely many intervals, by the formulas 


f(r) = gx(r) = the kth root of the polynomial go(r)y” +q1(r) y+: -+¢m(r) 
for go(r),---,@m(7) polynomials in r. 


We are interested in the interval (d, 00), some large enough d € R, and may 
assume that no q;(7) vanishes in the interval. 
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Consider a nonstandard model *R of the theory and the following preorder- 
ing on *R: 


a< if ; is infinitesimal. 


Set a © G if neither a < 8 nor 3 < a, equivalently, (a3~+ — c) is infinitesimal, 
for some c € R (see 2.13). 

Let y € *R be positive infinite, that is, y > r for any standard real r. Denote 
6 = f(y). 

Then 6 is a root of the polynomial go(y)y™ + (yy! +++: +4m(7), and 
clearly this can not happen unless 


gi(y)d™—* = gy(y)6"4 


for some 0 <i<j<m. 
Hence e 
59+ xe Shy) me 
qi 
for N = deg q,; — deg q. It follows that 6 = y*, for a = N/j —1. By definition, 
this means 


oI 


Y“foly) =et+a (6) 


for some c € R and an infinitesimal a. It remains to show that (6) holds for 
the same a and c for every nonstandard infinite y. 
Note that (6) implies that for x = 6 the following L;Real-definable property 
holds: 
la~* fy(a) — el <1. (7) 


Hence, again by o-minimality, for all x € (d,oo), for some d € R, (7) holds. If 
for another choice of 7 we had different a or c, then we would have (7) with the 
different parameters holding on (d’, oo), for some d’ € R, clearly a contradiction. 

It remains to see that c > 0 and a > 0. This is immediate from the assump- 
tion on fp. 


4.10 Remark. A comment on the cause of efficiency of the method of proof 
above and in other similar cases is in order. Quantifier elimination is in fact a 
powerful calculus designed to translate complex formal expressions (Z-formulas) 
into something simple and, in many cases, geometrically meaningful. An exam- 
ple of such an expression is the definition of the function f, in 4.9. Its conver- 
sion into a semialgebraic function, if carried out “by hand,” is a painful process, 
difficult to see through. 

Note also that modern methods of elimination of quantifiers demonstrated 
in 3.18, 4.1, and 4.7 are more efficient and more “mathematical” than those 
of the 1950s. The initial instinct was to analyze the syntax of an arbitrary 
L-formula and get rid of quantifiers in the formula one by one in an inductive 
process. 


4.11 Decidability. The theories ACF,, for each p prime or equal to 0, and the 
theory RCF are decidable. 
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PROOF. These are just special cases of the following general statement: A 
complete theory T axiomatizable by a recursively enumerable set of axioms is 
decidable. This is easy to see. Indeed, if there is an algorithm listing axioms of T’ 
then it is easy to compile an algorithm that lists all consequences of the axioms, 
that is, enumerates, T. Now, given a sentence P one can decide whether P is in 
T by the following algorithm: list by the above algorithm formulas Q; of T’ and 
check at each step whether P = Q; or ~P = Q;. By completeness at some step 
one or the other must happen and this obviously decides whether P is in T. 

The explicit axiomatizations of ACF, and RCF are clearly recursive; hence 
decidability follows. 

The easy argument above can be adapted to prove decidability for some 
incomplete theories, such as ACF, the theory of all algebraically closed fields. 
This is axiomatized by the recursive set of axioms 4.1, I and I]. And we also 
know that if P is not deducible from ACF, then —P is consistent with some 
ACF), p prime or 0. Note that there is an obvious enumeration of the family 
ACF), p € PrimesU {0}: for each p and number n we can effectively produce an 
axiom Sp, € ACF», listing eventually all of the axioms. This can be extended 
to an algorithm that for each n € N produces formulas P,,., p = 0 or prime, 
kKEN, p,k <n, such that {P,,:k € N} = ACF). 

Now, given a sentence P, turn on an algorithm that for a given n € N, 
produces 


(i) Qi,.--,Qn, the first n consequences of ACF; 
(ii) Poa,---;Pon, Poa,..-,Pan,---; Poa,---,Ppn, for p = 0 or prime, p < n. 


We check at each step m whether P is in (i) or 4P is in (ii). One of these 
two must happen at some stage n, and this decides correspondingly whether P 
is deducible from ACF. 


4.12 The theory of p-adic numbers. 

The symbols of the language for valued fields Lyaig has the symbols of field 
theory, namely 0,1,+,-, and a unary relation symbol V. 

The theory TQ, in this language is axiomatized as follows. 


I. A model F of the axioms carries a structure of a field of characteristic zero. 

II. Axioms stating that V singles out a maximal subring of the field F (the 
valuation ring), so V(F) is a local ring with a unique maximal ideal M(F). 
We stipulate that V(F)/M(F) = Fy), the p-element field. The canonical 
homomorphism is denoted by res. 

III. The value group F*/V*(F) =T(F) is a Z-group, i.e., written additively, 
has the same (+,<) theory as the ordered group of integers Z (< is the 
order relation definable using the valuation ring). The infinitely many ax- 
ioms for a Z-group say that there exists a minimal positive element and 
nT is a subgroup of [ of index precisely n. 

We denote by v(x) the image of x € F* under the canonical homomorphism. 

We add the axiom stating that v(p) is the minimal positive element of I. 

IV. Axiom stating that Hensel’s lemma holds: for any polynomial f(a) over 
V(F), if there is a € V(F) such that resf(a) = 0 and resf’(a) 4 0, then are 
a’ €V(F), f(a’) = 0, and v(a’ — a) > v(f’(a)). 
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Thus one gets an axiomatization of the theory of p-adically closed fields, 
namely, all the fields that are elementarily equivalent to Q, in the language of 
valued fields. 

For quantifier-elimination purposes one needs an extension of the language. 
A. Macintyre introduced the extension by countably many unary predicates 
Pn, 2 = 2; we call this extension ie. The axiomatic description of the new 
predicates is V: 


Va[pn(a) > Jy(y" = «)). 


That is, each p, singles out the set of nth powers in F. 


Obviously, the last set of axioms does not impose any extra conditions on 
the valued field. 


4.13 Theorem (J. Ax—S. Kochen, Yu. Ershov, A. Macintyre). The the- 
ory TQ, is complete, decidable and allows elimination of quantifiers in the 
language LM@°, 

We do not give a proof of the theorem here. The model-theoretic meth- 
ods of known proofs are essentially the same as above but the algebra is much 
more involved. The first proofs of completeness, decidability, and elimination 
of quantifiers (in a different language) were given by Ax—Kochen in 1965. Inde- 
pendently, Yu. Ershov proved completeness and decidability. 

Macintyre proved elimination of quantifiers in the present form in 1976. Note 
that in general, the choice of a language for quantifier elimination may be essen- 
tial for applications. The introduction of the predicates p, made the quantifier 
elimination statement much more useful and powerful. The first consequence 
of this quantifier elimination is the manifestation of similarities between the 
theory of the reals and the theory of the p-adics. Recall that in the reals 
the predicate p2(x), which of course means x > 0 in this context, is used 
for the quantifier elimination statement. It is also useful to remark that this 
predicate is basic for describing the topology and geometry over the reals. 


4.14 p-adic integration. Let Z, denote the ring of p-adic integers. 
Let fi(Z),..., f-(Z) be polynomials in m variables = (x1 ,..., Um) over Zp. 
For n EN, let N,, be the number of elements in the set 


{mod p":%€Z"™ and f;(Z) =Omodp”", fori =1,...,r}, 
and let N,, be the number of elements in the set 
{Zmodp":%€ Z"™ and f;(Z) =0, fori=1,...,r}. 


To these data one can associate the following Poincaré series: 


Br) = 0 Nar", PT) = ONT". 
n=0 n=0 


Borevich and Shafarevich conjectured that P(T) is a rational function of T. 
This was proved by Igusa, in the case r = 1, and Meuser for arbitrary r, 
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by adapting Igusa’s method. Serre and Oesterlé asked whether also P(T) is 
a rational function. Denef proved the rationality of both series using p-adic 
quantifier elimination. This method was extended later for further applications. 
For a € Qp, let jal = p~*. Let |dz| = |dzri| - |dr2|---|dr,| be the Haar 
measure on Q;’ such that the measure of Z7” is 1. 
Igusa’s original proof starts by establishing a rational relation between the 
integral 
He) = f \r@l"lazl, 
Zp 


as a function of p-*, for s € R, s > 0, and P(p-!-*). The calculation of the 


integral is elementary in the case that f(a) is a monomial, using the fact that 
the function | f(x)| is then constant on p"Z, \p"*t'Z,. In general, though, | f(zx)| 
is still piecewise constant; it is quite hard to determine the absolute value on 
the pieces. Here Igusa uses the embedded resolution of singularities of Hironaka. 

Meuser’s proof extends these calculations to a similar integral over the 
domain Zr”. 

A similar idea in the case of P(T) leads to a p-adic integral over a more 
complex domain. Denef considers the domain 


Dy ={(@1,---,2m,w) € isola dy € Zz =ymodw and 
fi(y) =0 fori =1,...,r} 


and the integral 


Ir(s) =f |ul*|aa)lau. 
Dy 
Again, by elementary calculation 


P— 1 —(m —s 
Le mee), 


So to prove that P(T) is rational we need to prove that J(s) is a rational 
function of p*. The main new difficulty here is the nonelementary shape of Dr, 
but this is overcome by the use of Macintyre’s quantifier-elimination theorem. 
It is sufficient to prove that an integral 


| la(2)|*|az, 
S 


for a polynomial g(Z) and a semialgebraic subset S' of Zt, is a rational func- 
tion of p*. This can be done by essentially Igusa’s method. As was mentioned 
above, this uses Hironaka’s resolution of singularities. But later Denef noticed 
that a more thorough characterization of p-adic semialgebraic sets based on 
earlier work by P. Cohen (the cell decomposition, widely used in the analysis 
of real semialgebraic sets) allows one to prove the theorem without referring to 
Hironaka. 
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The term classification theory usually refers to the body of work around 
Shelah’s Classification Theory, the main idea of which being to place every 
complete theory into a node of a hierarchical tree of stability theory. In its 
broadest meaning the degree of stability is an indicator of a tameness, or in 
other words, the degree to which a structural classification of models of a 
given complete theory can be developed. The study of o-minimal theories (now 
also extended to the study of c-minimal, v-minimal, and others), another very 
important part of model theory, is usually treated separately. But we include 
o-minimality in this survey, seeing a justification of this both in its importance 
and in its interactions with stability theory seen in recent years. 


5.1 Categorical Theories. Classification theory has at its center theories cat- 
egorical in uncountable powers. Unless stated otherwise we assume throughout 
in this section that our languages are countable. 

Recall that a theory T is (absolutely) categorical if T has a unique, up to 
isomorphism, model. By Lowenheim—Skolem this can be the case only when the 
unique model of T is finite, while one really is interested in infinite structures. 
So a more flexible notion of categoricity has been considered. We say that a 
theory T is categorical in power ju (~-categorical) if T has a unique, up to 
isomorphism, model of cardinality pw. It is easy to see that if for an infinite 
cardinal p, a theory T has no finite models and is p-categorical, then T is 
complete. So, pi-categoricity is a stronger form of completeness. 

It is interesting and appropriate to look at the phenomenon of pz-categoricity 
from the algebraic point of view. Suppose we are given an L-structure A of 
cardinality j such that the L-theory Th(A) is p-categorical. This can be trans- 
lated into a more suggestive characterization: the sentences of Th(A) together 
with the cardinal jp comprise a complete set of invariants for A. An especially 
interesting case is that in which L is small (countable) and yp is large. This, in 
effect, could be taken as a mathematical form of algorithmic compressibility, 
the property of nature that some philosophers of science believe makes the laws 
of the universe and science itself possible. 

J. Los conjectured in the 1950s that if a theory T of a countable language 
is -categorical for some uncountable py, then it is u-categorical in all uncount- 
able powers (uncountably categorical). A decade later, M. Morley published a 
seminal paper with a proof of Los’s conjecture. 

One of the main new tools in Morley’s paper was the notion of a rank, a 
function with certain properties assigning an ordinal number to each definable 
set, which Morley proved exists for every uncounably categorical theory. 


5.2 Stability. The point of departure in Morley’s analysis of a «-categorical T 
is the fact that the number of complete 1-types in T must be countable, and 
moreover, given a set C’ of new constant symbols naming some elements in a 
model of T and the complete theory T¢ of this model in the extended language, 
the number of complete types in the theory T¢ is at most cardC + No. This 
follows from the Ehrenfeucht—-Mostowski theorem (see 3.17), immediately in 
the case of types over JT, and with a little more work in general. This property 
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of a theory T is called No-stability. The term “stability” should be taken here 
as the opposite to “diversity” of types of elements in models of T. The actual 
terminology used to express this “diversity” is forking (also dividing, splitting, 
and some others), and stability guarantees that forking does not go too far. 

More generally, given an infinite cardinal number x, a theory T is said to be 
«-stable if the expanded theory Tc has at most & complete 1-types for every C’ 
of cardinality x. 

A theory T is said to be stable if it is x-stable for some infinite x. 

Shelah’s theory distinguishes several cases of stability. No-stability 
is the strongest one and implies «-stability for all «. Another possibility for 
a stable theory T is that it is x-stable for all « > 2°47. In this case T is said 
to be superstable. In remaining cases T is stable in all cardinals except for those 
of low cofinality. 

Note that the definitions above remain equivalent if one replaces 1-types by 
n-types. 


5.3 Morley rank. The following definition makes sense for No-stable theories. 

Let M be a universal domain for T and Def,,(M) the collection of all 
nonempty subsets of M” definable with parameters in M. Morley rank is 
the minimal function rk : Def,(M) — Ord (ordinal numbers) satisfying the 
following: 


tk S > a+1 if and only if there is a countable family {S;: 7 € N} of pairwise 
disjoint subsets of S with rk S; > a; 
for a limit 6, rk S > 6 if and only if rk S > a for alla < 6. 


In effect, since M is at least No-saturated, the definition does not depend 
on M. The fact that one can assign an ordinal rk S with the above property 
to every definable set S' is due to the bound on the number of possible types, 
that is, No-stability of T. A simple combinatorial argument proves that a priori 
rk S < Xy. A much more difficult theorem (J. Baldwin) established later says 
that for uncountably categorical T, rk S is always a finite number. Moreover, in 
this case the rank enjoys the following addition formula: 

Let pr: M"*™— M"™ be the projection (71,..-,;%n,---;2@ntm)'? (@1,---,2n) 
and S' € Defn4m(M). Then 


rkprS+ min rkS, <rkS <rkprS+ max rkS,, 
acprs ae€prs 


where 5S, is a fiber over a. 


5.4 Example. The theory ACF, of algebraically closed fields of characterictic 
p is p-categorical for uncountable js, since the isomorphism type of a model 
F of ACF, is, by Steinitz’s theorem, determined by trdF, the transcendence 
degree of F, and trd F = card F, for uncountable F. Recall that definable sets in 
this structure are just constructible sets. So algebrogeometric dimension, dim S, 
is well defined for any definable set S. One can easily check (by induction on 
dim S) that 
rkS=dimS 


in this case. 
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Stability is inherited by a structure definable in a stable structure is itself 
stable. Following 3.3, the group GL, (F) (in the language of groups) is definable 
in the field F. So, the theory of GL,,(F) is No-stable and rk is well defined in 
this theory. In fact, as in the previous example, every definable set S in this 
theory is constructible and rk S = dim S. 

There are many more structures definable in ACF,. A natural class of 
examples is that of algebraic varieties as structures in the natural language for 
algebraic varieties: let V = V(F) be the set of F-points of an algebraic variety 
defined over some C'CF, F a model of ACF). For each n and each C-definable 
subvariety W CV” introduce the symbol py of an n-ary predicate on V. The 
natural language for the algebraic variety V consists of all the pw for all W as 
above. The structure V on the domain V(F) with the obvious interpretation 
of the predicates of the natural language is definable in the field F. Its theory 
is No-stable and, for C' big enough (e.g., if C’ contains an algebraically closed 
subfield), Morley rank coincides with dimension. In general rk. S < dim S for 
constructible sets S definable in V. 


5.5 On the other hand, RCF, the theory of the field of reals R, is not stable. 
In fact, every theory T with an order relation definable on an infinite subset 
in its model is not stable. Indeed, first note that by the method of diagrams, 
we can embed into a model of T any ordered set (C,<). It is known that for 
every infinite « there is an ordered set (C,<) of cardinality « with more than 
«& Dedekind cuts in it. Distinct cuts give rise to distinct complete types in the 
theory Tc, which shows that T is not «-stable. 

In the above case one says that the theory T has the strict order property. 

It is not difficult to see that the theory TQ, of the p-adically valued field has 
the strict order property (use the order on the value group). With more analysis 
of definability one can see that the theory of the field Q, in the language of 
fields alone has also the strict order property, so is unstable. 


5.6 Another pattern of nonstability can be seen in the example of a pseudofinite 
field. By definition this is any infinite field F that is elementarily equivalent to an 
ultraproduct [],-,;Fi/U of finite fields F;. The study of such fields was started 
by Ax and Kochen and it is known that they do not have the strict order 
property but do satisfy another property that implies nonstability. 

One says that a complete theory T has the independence property if there is 
a formula P(%,¥%) in the language of the theory such that for every n in some 
model A of the theory one can find n tuples ¢;, 7 € {1,...,n} and 2” tuples 
by, JC{I1,...,n} such that 


AF P(b,,é;) iff j € J. 


Clearly, the number of complete m-types (m = length) over parameters 
C1,---,@n is at least 2”. Using compactness one finds in a universal domain 
for T a subset C of cardinality « with at least 2" complete types over C. 


5.7 Indiscernibles and orthogonality. A subset J in an L-structure A is 
said to be indiscernible over a set of parameters C' (C-indiscernible) if for any 
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Lc-formula P(x1,...,%,) either AF P(t1,...,%n) for all distinct 71,...,in € I 
or AF AP(ii,...,%n) for all distinct 71,...,i, € I. 

An instructive example of an indiscernible set I is an algebraically indepen- 
dent subset of an algebraically closed field. 

A characteristic feature of a stable theory T is that in saturated models of 
T, indiscernibles (over a given small set of parameters) are ubiquitous. 

In particular, choosing C = A to be a set of parameters naming all elements 
of a small (say, countable) model A and A = B, a saturated enough model, 
every 1-type p over A can be defined by an A-indiscernible set I C B: 


(i) every a € I realizes p; 
(ii) for every C, ACC'CB, the average type of I over C, 


Avo(Z) = {Q() : Lc-formula s.t. BF Q(i) for all but finitely many 7 € I} 


is a nonforking extension of p. 

Consequently, in general the cardinality of a maximal A-indiscernible subset 
I = I, C B such that p = Ava(J,) is an important cardinal invariant of 
(p, B) over A. For instance, in the above example with fields, card I, is exactly 
trd(B/A), the transcendence degree of the field B over A. The average type 
of an algebraically independent subset of a field is called the generic type of 
the field. 

Two types p and q over A are said to be nonorthogonal if card I, = card Ig 
in every model B, A = B. Clearly, the nonorthogonality is an equivalence 
relation. A theory is called unidimensional if any two 1-types over a model are 
nonorthogonal. Otherwise, the types are said to be orthogonal. 

Every uncountably categorical theory is unidimensional. On the other hand, 
it is easy to construct a stable theory with “many dimensions.” For example, 
the theory of the direct product A; x Ag of two algebraically closed fields is 
w-stable but “two-dimensional.” It has models with any combination of trd Ay 
and trd Ag. 

Particularly interesting and essential is the analysis of the orthogonality 
relation in the theory of differentially closed fields. 


5.8 Differentially closed fields. A structure (K,+,-,0,1,D) in the language 
of fields extended by an operation symbol D: kK — K is called a differential 
field if K is a field of characteristic 0 and D satisfies the Leibniz rule: Dry = 
tDy + yDz«. A differential polynomial of order < n in the variable y is an 
expression f(D"y, D"~'y,...,y), where f(x0,71,...,2n) is a polynomial over 
kK. A differential field is said to be differentially closed if for every n > 0 
and every differential polynomial g(y) of order n and a nonzero differential 
polynomial h(y) of order < n there is an s € K such that g(s) = 0 and 
h(s) 4 0. (Note that the field (K, +, -) is algebraically closed then.) This is easily 
axiomatized in the first-order language, and the corresponding theory is called 
DCF». This theory was studied by A. Robinson, who proved that it is complete 
and has elimination of quantifiers. Later, C. Wood observed that the theory is 
w-stable of Morley rank w. The fact that the rank is infinite agrees well with the 
intuition that K is an infinite-dimensional space over the one-dimensional field 
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of constants Cx = {y © K : Dy = 0}. Indeed, rk Cx = 1. Moreover, in general 
the solution space Sp = {y € K : f(D"y, D"~1y,...,y) = 0} of a differential 
equation of order n is of rank n. 

Now let us compare the generic type of S (appropriately defined) with the 
generic type of the field of constants. The nonorthogonality of the two types 
can be translated into the statement that the solution space S'y is parametrized 
by the field of constants Cx. A typical example is given by a linear differen- 
tial equation f(D"y, D"~'y,...,y) = 0, where Sy is in a definable bijective 
correspondence with the linear space CX. 

On the other hand, for a generically chosen differential equation f, the 
definable set Sy is orthogonal to the constants. 


5.9 Example. Algebraically closed difference fields. A difference field is 
a structure (,+,-,0,1,0), with (K,+,-) a field and o an automorphism of 
the field. A difference field (K,+,-,c) is said to be algebraically closed (also 
existentially closed) if any finite set of quantifier-free formulas over K that has 
a solution in some extension of K has a solution in kK. Hrushovski proved that 
this definition is axiomatizable and that the theory of a given algebraically 
closed difference field of characteristic zero, although unstable, is simple. 

It is useful to observe the many similarities between this theory (also called 
ACFA, algebraically closed field with an automorphism) and the theory DCF. 
The fixed field F = {x € K : ox = x} is a direct analogue of the field of 
constants, and is known to be of rank 1 (so-called SU-rank, in the case of 
simple theories). Given a polynomial f over K, the solution set Sr = {ye Kk: 
f(o"y,o"ty,...,y) = 0} of a difference equation of order n is of rank n. 

For some definable sets more can be said, e.g., the solution set T,,, for the 
equation oy = y™, for m > 1, is of Morley rank 1 and the set is orthogonal 
to the fixed field. This set contains important Diophantine information: in an 
algebraically closed difference field K any root of unity of order n, prime to 
m, belongs to the set T,,. Indeed, the equations y” = 1 and ay = y™ have 
a solution in a differentially closed field, since there is a Galois automorphism 
taking a root y of order n to y™. 


5.10 Shelah’s criterion of stability. A complete theory T is stable if and 
only if it does not have the strict order property or the independence property. 

We saw already that any of the properties imply nonstability. The converse 
is a nontrivial and powerful statement proved by Shelah using beautiful infinite 
combinatorics, characteristic of many proofs in this field. 

Negation of any of the two properties for a theory T is seen as an indicator 
of tameness of T. A theory T is said to be simple if it does not have the strict 
order property. 

The theory of a pseudofinite field is simple. 

A theory T is said to be dependent (or NIP, nonindependence property) if 
it does not have the independence property. 

The theory TQ, of the p-adics is dependent (Shelah—Hrushovski). A large 
class of dependent theories is the class of o-minimal ones. 


5.11 o-minimal theories. A complete theory T is said to be o-minimal if any 
model A of T is linearly ordered by a definable relation < and every subset 
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of A definable with parameters is a union of finitely many open intervals and 
points (A.Pillay, C.Steinhorn, and L. van den Dries). 

The property of o-minimality implies a rich structural theory. One of the 
consequences of the theory is the fact that an o-minimal theory is dependent. 

We have mentioned above that the theory RCF is o-minimal. A sem- 
inal theorem of A. Wilkie establishes o-minimality of the theory Rexp = 
(R,+,-,exp,0,1), the field of reals with exponentiation. One of the corollar- 
ies of this theorem is the fact proved earlier by A. Khovanski that a zero-set 
of a system of exponential-polynomial equations has finitely many connected 
components. 

Many more expansions of R by classical analytic functions have been proved 
to be o-minimal, and o-minimal analysis today has become a broadly used tool 
of real analytic geometry. 


6 Geometric Stability Theory 


6.1 Strongly minimal sets and pregeometries. In analyzing models of 
uncountably categorical theories (and more generally) and their definable sub- 
structures with regard to the nonorthogonality relation, one realizes the special 
role played by the minimal ones. 

A structure M is said to be minimal if rk M = 1 and for any partition 
M = S,US> into subsets definable using parameters, rk S$; = 0 or rk Sy = 0. 
M is said to strongly minimal if every M’' = M is minimal. This is also appli- 
cable when M is a definable subset in an ambient structure A. One treats the 
set M as the domain of a structure M with relations on M induced from A. 
In this case one usually calls M a strongly minimal set. In algebraic geometry, 
or rather the theory ACF,, the strongly minimal subsets of F” are (irreducible) 
algebraic curves with a finite number of points added or removed. 

It is not difficult to prove that the theory of a strongly minimal M is 
uncountably categorical. 

In an arbitrary L-structure A one defines the notion of an (abstract) alge- 
braic closure cl. 

Given a subset UC A and a point v € A we say that vu € cl(U) (v belongs 
to the algebraic closure of U) if there is an Ly-formula P(«) such that the 
definable set P(A) is finite and contains the point v. 

Again, in ACF, and in RCF the abstract algebraic closure is the usual 
field-theoretic algebraic closure. 

It is easy to check that in any structure the following properties hold: 


(i) UCV implies U Cel(U) Cel(V); 
(ii) cl(cl(U)) = cl(U). 


Less obvious is the following property, the exchange principle, which holds 
in any strongly minimal structure M: 


(iii) For any UC M and elements v,w € M, 


w €cl(U,v) \cl(U) = v € cl(U,w). 
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(Here and below, cl(U, v) := cl(U U {v}).) 
Note also that the operator cl is finitary, in the sense that 


cl(U) = |_{cl(U’) : finite U’ CU}. 


We say that (M,cl) is a (combinatorial) pregeometry if cl is a finitary oper- 
ator satisfying (i)—(iii). 

A (combinatorial) geometry is a pregeometry (M,cl) such that cl(u) = {u} 
for any u € M. This notion, known also under the names matroid and depen- 
dence relation, was used in combinatorics and in algebra by van der Waerden 
to develop a unified theory of dependence relations such as linear dependence 
and algebraic dependence in fields. 

Given a pregeometry (M,cl) one associates the geometry (IM, cl) with it by 
setting M to be M \cl(@) factored by the equivalence relation u ~ v © cl(u) = 
cl(v). 

On the other hand, one can modify a pregeometry (M,cl) by replacing the 
closure operator cl with cl,, for a fixed element a € M, defined as 


cla(X) := cl(U, a) for any UCX. 


The new pregeometry (M, cl,) is called then the localization of (M,cl) at a. The 
model-theoretic meaning of localization is just the extension of the language by 
a symbol for a. 


6.2 Dimension in a pregeometry. A set UC M is said to be independent if 
cl(U) 4 cl(U’), for any proper subset U’ c U. 

A maximal independent subset of M is said to be a basis of M. 

It is easy to prove that any two bases of a pregeometry M are of the same 
cardinality, which is called the dimension of M. More generally, we denote by 
d(X), for X C M, the dimension of the subspace cl(X) of the pregeometry M. 
When working with strongly minimal and more generally stable structures, it is 
important to distinguish this notion from other notions of dimension, such as the 
Morley rank. For these reasons we sometimes say combinatorial dimension for 
the dimension of a pregeometry. Note, however, that there is a deep relationship 
between the combinatorial dimension and ranks, in particular the Morley rank. 

For a definable set SC M™ in a strongly minimal structure M, 


rk S = max{d(a1,...,2n) : (@1,.--,%n) € SH. 


Pregeometries (M,cl) induced by strongly minimal structures M have the 
following crucial property, called homogeneity: 

Every bijection between two bases of (M,cl) can be extended to an automor- 
phism of the pregeometry. 


6.3 Examples 


(1) Let M be a trivial infinite structure, that is, an infinite set considered as 
a structure in the trivial language (the only predicate is equality). This is 
a strongly minimal structure with the pregeometry given by the the trivial 
closure operator, cl(U) = U for every set U. 


366 X Model Theory 


(2) Let A = (A,+,0) be an abelian divisible group satisfying the assumption 
that for each positive integer n the equation nz = 0 has finitely many 
solutions in A. This structure is strongly minimal and its theory has elim- 
ination of quantifiers. The closure operator cl is the same as the linear 
closure, that is, {ui,...,u,} is dependent in the sense of cl if and only if 
mu, +:-:+mpux = 0 for some nonzero string of integers m1,..., mx. This 
example can be generalized by considering K-modules for arbitrary division 
rings K instead of Q. 


Observe that the geometry associated with the pregeometry (A, cl) is the 
projective geometry over K (projective space P*(K)), where « is the cardinal 
number equal to the dimension of A. 


(3) An algebraically closed field F is a strongly minimal structure that is a 
pregeometry with respect to the (field-theoretic) algebraic closure. 


The pregeometries (1) and (2) satisfy the property called modularity: 


w €cl(U,v) & du ecl(U): w € cl(u, w). 


Example (3) is not modular. One says that (,cl) is locally modular if a 
localization (M,cl,), for some a € M, is modular. 
An example of a locally modular but not modular pregeometry is an affine 


geometry over a field K, that is, a K-vector space V with a set {v9,v1,---,Un}C 
V considered dependent if and only if {v1 — vo,...,Un — vo} are K-lineraly 
dependent. 


All the pregeometries listed above are homogeneous. Note that, for example, 
the pregeometry of algebraic dependence on the reals R is not homogeneous. 
As a matter of fact, it is very hard to find a homogeneous pregeometry not 
reducible to (1)—(3) in an obvious way. The only examples known today come 
from a construction by E. Hrushovski, which will be discussed below. 


6.4 Weak trichotomy theorem. Let M be a strongly minimal structure and 
(M, cl) the pregeometry induced by it. Then one and only one of the following 
holds: 


(i) the geometry associated with (M,cl) is trivial; 
(ii) the geometry associated with (M,cla), a localization of (M,cl), is isomor- 
phic to a projective geometry over a (countable) division ring; 
(iit) there is a pseudoplane definable in M. 


We need to explain (iii). A pseudoplane (first considered by A. Lachlan) is a 
structure on two infinite domains P and L with a binary relation J between the 
domains. Elements of P are called points, elements of L lines, and J is called an 
incidence relation. We may associate with any @ € L the set of points incident 
to £, and one of our assumptions is that distinct lines correspond to distinct 
such sets. Our definition here is more narrow than Lachlan’s original one. The 
assumptions are: 

- the structure (P, L,I) is w-stable with rk P = 2 =rkL. 

- the set of points incident to a given line is of rank 1; 
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- the set of lines incident to a given point is of rank 1; 

- every two lines intersect in at most finitely many points; 

- through any two points pass finitely many lines. 

An example of a pseudoplane is an algebraic surface P (not necessarily 
closed) with a 2-dimensional family L of curves on it (F-points of these, for F 
algebraically closed). Removing, if necessary, exceptional 1-dimensional subsets 
from P and L one can always get the above conditions satisfied. 

A special case of a pseudoplane is an abstract affine (or projective) plane, 
well known to combinatorial geometers. It is a classical theorem that any such 
plane, if it satisfies a combinatorial Desargues theorem, is definably equivalent 
to a division ring F. Under the assumptions of w-stability such a division ring 
has to be an algebraically closed field. Thus, the weak trichotomy theorem 
suggests that the pregeometries of the three examples in 6.3 are the only ones 
possible. This was proposed as the trichotomy conjecture by the present author. 
Observe that when one assumes local finiteness of a strongly minimal structure 
M, that is, that cl(U) finite for finite U C M, then the type 6.3(3) pregeometry 
is excluded: algebraically closed fields are not locally finite. So the following 
supports the trichotomy conjecture. 


6.5 Theorem. An infinite locally finite homogeneous geometry is isomorphic 
to one of the following: 


(4) trivial geometry; 
(it) projective geometry over a finite field; 
(iii) affine geometry over a finite field. 

Note that (it) and (iii) are special cases of 6.4(ti). 

The proof of the theorem is based, as is the proof of 6.4, on a combinatorial- 
geometric analysis, using delicate calculations with model-theoretic ranks. The 
main target of the proof is to exclude the possibility of a pseudoplane. One 
develops an intersection theory on a pseudoplane (akin to Bézout’s theorem) 
and arrives at a numerical contradiction. 

A refinement of this method lead to a similar classification of all finite 
homogeneous geometries starting from dimension 7. 

An alternative proof of the theorem has been derived from the classification 
of finite simple groups and ensuing classification of finite 2-transitive groups 
(Cherlin, Mills). 

Nevertheless, the general trichotomy conjecture was refuted in a series of 
examples engineered by Hrushovski. 


6.6 The trichotomy principle. The weak trichotomy theorem was just one, 
technical, motivation for the trichotomy conjecture. There are more serious, 
conceptual, reasons to hope for a form of the trichotomy conjecture to be true. 
The main one is the undying intuition that the reality around us can be reduced 
to basic simple forms. A large structure that has a categorical description in a 
countable language may well be considered as one of those “simple forms” (see 
also a short discussion in 5.1), and one would expect that all such are known. 
So the artificial counterexamples constructed by Hrushovski in 1988 raised 
the question whether this intuition is fundamentally wrong, or there is a way 
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to amend the conjecture or at least find a less alarming explanation of the 
facts. Fortunately, the developments of the last 20 years strongly support the 
latter notion. First, a very productive way to correct the initial conjecture has 
been found (Zariski geometries, see below), and second, the counterexamples 
have been to a great extent explained in terms of mainstream mathematical 
structures, much in the spirit of the trichotomy conjecture. 


6.7 Diophantine geometry. The model-theoretic geometric concepts intro- 
duced above are crucial for many applications. 

Consider a field K and a subgroup T of a commutative algebraic group 
A(K). We say that T has the Lang property if for every algebraic variety V CA, 
the intersection [7 V (i) is a union of finitely many cosets of subgroups of the 
form [7 B(K), for B an algebraic subgroup of A. 

Now suppose I C A”, where A is a strongly minimal group definable in 
some expansion of the field K (e.g., a differentially closed field or a difference 
field). For A either (ii) or (iii) of 6.4 must hold, and provided that A satis- 
fies (ii) (is locally projective), it is easy to deduce that any definable subset 
of A” is a finite union of cosets of definable subgroups (a more general ver- 
sion of this proved by Hrushovski and Pillay). It follows that A”, and hence I, 
has the Lang property. In fact, the converse is also true: the Lang property of 
T is equivalent to a more general version of (ii), called one-basedness. In par- 
ticular, Faltings’ theorem stating that any finitely generated subgroup I of a 
semiabelian variety A(/C), for K of characteristic zero, has the Lang property 
is equivalent to the statement that the theory of an algebraically closed field K 
expanded with the predicate for [ is superstable, with the geometry of [ 
one-based (A. Pillay). 


6.8 Zariski geometries. The original aim was to reformulate and strengthen 
the idea of a “simple form” behind the trichotomy conjecture. This is done by 
adding a topological component to what originally was a concept of pure logic. 
We now want to distinguish positively definable sets (definable without using 
the logical negation) from arbitrary ones. In an L-structure M we call a subset 
SCM” closed if it is positively quantifier-free definable (using parameters). We 
denote by pr the projection M"+! > M™®, for any n,m, and write S for the 
closure of a subset S C M™”, the minimal closed set containing S$, when such 
exists. We denote by S(a) the fiber of S over a € prS under the projection. 

A one-dimensional Zariski structure (also often Zariski geometry) is a 
strongly minimal structure M satisfying the following: 


(ZO) the closed sets form a Noetherian topology on M™, for all n > 1. 

(Z1) prS pr (S) \ F, for some proper closed subset F' C pr (S). 

(Z2) For S a closed subset of M"*!, there is m such that for all a € M™, 
S(a) = M or |S(a)| < m. 

(Z3) Given a closed irreducible S C M™, every irreducible component of the 
diagonal SM {a; = xj} (i<j <n) is of Morley rank at least rk S — 1. 


In fact, one can equivalently reformulate this definition without assuming 
that the dimension in M is the Morley rank, that is, without assuming a priori 
that M is strongly minimal. 
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Similarly, but with a bit more work, one introduces the notion of a 
general (multidimensional) Zariski structure as a topological structure with 
a nice dimension notion. A key basic theorem then states that the theory of 
a Zariski structure M allows elimination of quantifiers, is w-stable, and the 
Morley rank of M is finite. 

Obvious examples of Zariski structures are smooth algebraic varieties /(F), 
for F an algebraically closed field, in the natural language for algebraic varieties 
(see 5.4). 

A less obvious class of Zariski structures is the class of compact complex 
manifolds M in the language Lan, whose basic m-ary relations correspond to 
analytic subsets SC M™. Note that this class is essentially nonalgebraic and 
very diverse. The fact that each of the structures in this class is w-stable of 
finite Morley rank is quite surprising and is a good illustration of the power of 
the notion of a Zariski structure. 

One more class of examples comes from the theory DCFo, of differentially 
closed fields. A solution space for a differential equation f(y) = 0 in one variable 
of order n is a Zariski structure of dimension n (Hrushovski for n = 1, Pillay 
in general). A similar but more delicate statement is true for appropriate the- 
ories in positive characteristic (Hrushovski). The differentiation in this case is 
understood as the Hasse differentiation, a sequence of operators corresponding 
to orders of differentiation. 

The theory ACFA (see 5.9) is a source of another class of Zariski geometries. 
The structure induced on any strongly minimal subset of an algebraically closed 
difference field is Zariski (Hrushovski-Sokolovich). 

We say that a Zariski structure M is nonlinear if there is a strongly minimal 
subset in M of type 6.4(iii). 


6.9 Classification theorem for Zariski structures. (Hrushovski, Zilber 
1993) For any nonlinear Zariski geometry M there are an algebraically closed 
field F and a nonconstant continuous function 


f:M-F. 


In particular, for a one-dimensional Zariski structure M there are a smooth 
algebraic curve Cyy and a continuous finite covering map 


p: M— Cm(F); 


the image of any relation on M is just a Zariski closed (algebraic) relation 
on Cy. 

The proof is in fact a reconstruction of algebraic geometry in M. We start 
in a universe “without numbers,” but with nicely interacting geometric objects 
such as curves, surfaces, and so on. It is possible then to develop in this universe 
a good intersection theory and an analysis of singularities, so that the notion 
of “a given branch of a curve a at the point p is tangent to a given branch of a 
curve b at p” is well defined. 

Now we look at a family of curves passing through a given point on the 
surface X x X, where X is a fixed curve, so that the curves from the family, 
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or rather their branches, give rise to local functions X — X around a point. 
Composing the local functions and factorizing by the tangency relation, we get 
a one-dimensional group (F,-) with a Zariski structure on it. A similar con- 
struction with F in place of X gets us a one-dimensional Zariski field (F,+,-), 
which has to be algebraically closed by Liouville’s argument. 

We then continue with the intersection theory and prove a form of Bézout’s 
theorem, which is used to prove the generalization of Chao’s theorem: every 
closed subset S of the projective space FP” is a zero-set of a system of homo- 
geneous polynomial equations. 

The latter translates into the final statement of the classification theorem: 
the only relations on F induced from M are the constructible ones. 


6.10 Applications. A consequence of the classification theorem is that the 
trichotomy principle holds for strongly minimal structures definable in: 


(a) differentially closed fields of characteristic zero, 
(b) Hasse-differentially closed fields of positive characteristic p, 
(c) algebraically closed difference fields, 
(d) compact complex manifolds. 

Hrushovski used (a) to give a new proof of the Mordell—Lang conjecture 
for function fields in characteristic 0, (b) to formulate and prove the analogue 
of the Mordell—Lang conjecture for function fields of positive characteristic, 
and (c) to produce a new proof, with better than previously known numerical 
estimates, of the Lang property for torsion points of semiabelian varieties (the 
Manin—Mumford conjecture). 

Pillay and Ziegler used (d) to establish a useful connection between the 
classification theory of compact complex manifolds and the theory of differential 
fields. 


6.11 “New” stable structures. As mentioned above, the trichotomy 
conjecture is false in general. Hrushovski in 1988 introduced a construction 
that produced a series of unexpected strongly minimal, and more general 
stable, structures for which the trichotomy principle fails. 

Suppose we have a class of strongly minimal L-structures H with the (com- 
binatorial) dimension d(X) for finite subsets of the structures. We want to 
introduce a new function or relation on M € H so that the new structure gets 
a good notion of dimension. 

Hrushovski observed that this can be done using the principle of free fusion. 
That is, the new function should be related to the old structure in as free a 
way as possible. A more precise form of this principle states that the number of 
explicit dependencies in X in the new structure must not be greater than d(X). 

The explicit D-dependencies on X can be counted as the L-codimension, 
|X| — d(X). The explicit dependencies induced by a new relation are those 
given by simplest “equations,” that is, basic formulas. 
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So, for example, if we want a new unary function f on a field, the condition 
should be 
trd(X U f(X)) — |X| 20, (8) 


since in the set Y = X U f(X) the number of explicit field dependencies is 
|Y| — trd(Y), and the number of explicit dependencies in terms of f (those of 
the form f(x) = x2) is |X|. We call the counting function 6(X) = trd(X U 
f(X)) — |X| a predimension in (M, f). 

In general, we think of a fusion between two structures, (M,L1) and 
another one that lives on the same domain, say (M,L2). Both structures 
carry a combinatorial pregeometry, with notions of dimension d;(X) and d2(X) 
respectively. Then the predimension 6(X) in the new structure (M, £1 U L2) is 
a simple linear combination of d; and dz, in fact uniquely determined by the 
free fusion principle. 


6.12 Now we consider the new class of structures 7/5 consisting of all the 
(M, L; U Le) satisfying the Hrushovuski inequality: 


6(X) > 0 for any finite X CM. 


The next clever idea is to choose in the class Hs a structure that is 
algebraically closed in the class. A way of defining the notion of algebraic 
(existential) closedness in a class is well known in model theory. The prototypes 
are algebraically closed fields, differentially closed fields, algebraically closed 
difference fields considered above, and many others. 

To define algebraically closed objects in Hs, Hrushovski first introduces the 
notion of strong embedding A <> B in the class. This means that A C B and 
for every finite X C A, 


min{d(Y) : X CY, for finite YC A} = min{d(Y) : X CY, for finite YC B}, 


that is, all dependencies between elements of A occurring in B can be detected 
already in A. 

A structure M? is said to be algebraically closed in H, if any finite quantifier- 
free type over M# realized in a strong extension of M?! is already realized in 
ME. 

Provided that Hs satisfies certain conditions, any two Hg5-algebraically 
closed structures are elementarily equivalent, and often their common theory is 
stable and even w-stable. In the latter case, if M* is such an H5-algebraically 
closed structure, M* becomes a homogeneous pregeometry with the (combina- 
torial) dimension 0 defined as follows: 


O(X) = min{d(Y) : X CY, for finite YC M}. 


6.13 Although at this step of Hrushovski’s construction we have obtained a 
new homogeneous pregeometry, our aim is not yet achieved. The structure M?# 
is not strongly minimal. Typically M#* is quasiminimal in the following sense: 
the structure M® is uncountable but every definable subset S C M is either 
countable or a complement of a countable one. So at the last stage of the 


372 X Model Theory 


construction one applies to M# a very delicate method called collapse: it chooses, 
following one of continuum many procedures pz inside M#, a substructure Mi, 
with asmaller domain, which is strongly minimal. Remarkably, the pregeometry 
of Mi, agrees with the pregeometry of M?, that is, the notion of dependence 
in the substructure is the same as in the ambient structure. In particular, the 
predimenision and notions of dimension in Mi, are defined exactly as in 6.12. 

Thus we get a continuum many new strongly minimal nonlinear structures 
and pregeometries. 


6.14 The discovery of the new strongly minimal structures in 1988 was an 
obvious challenge to the views and hopes expressed in 5.1 and 6.6. The suc- 
cess with the classification of Zariski geometries mitigated the disappointment, 
but nevertheless, the question whether the new structures are mathematical 
pathologies or a part of a bigger picture remained. 


6.15 Schanuel’s conjecture. A crucial breakthrough came with the following 
observation. 

Let the original class 1 in 6.11 be the class of algebraically closed fields F of 
characteristic 0 and suppose we want to add a new function, called suggestively 
ex , to the field. We want the new function to be a homomorphism between the 
two group structures on F, that is, 


ex (a1 + %2) = ex 21 + eX Ze. (9) 


The free fusion principle uniquely determines then that the predimension 6 for 
this class has to be 


6(X) = trd (X Uex X) —ldim X, for any finite X C F, 


where ldim X is the dimension of the Q-vector space generated by X. Now 
observe that the Hrushovski inequality of 6.12 is equivalent to 


trd (@1,...,@n,€X@1,...,eX@,) > n, for linearly independenta1,...,2n, 


which is exactly the Schanuel conjecture for the exponentiation ex = exp, 
F = C, the central conjecture of transcendental number theory. 

Variations of Schanuel’s conjecture, e.g., for elliptic functions, are also well 
known and indeed can be written in the form of Hrushovski’s inequality. It 
looks credible that the Hrushovski inequality properly applied is just the most 
general form of a Schanuel-type conjecture. 


6.16 Pseudoexponentiation. In the particular case of the class H(ex ) des- 
cribed above this author has carried out the steps 6.11 and 6.12 of Hrushovski’s 
construction (with some modifications). The resulting class of structures called 
algebraically closed fields with pseudoexponentiation, ACFExp, has the 
following properties: 


(i) ACFExp is aziomatizable by an explicit list of (not first-order) formulas, 
stating 
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(a) the validity of Schanuel’s conjecture and 

(b) that any system of n independent exponential-polynomial equations in n 
variables that does not directly contradict Schanuel’s conjecture has a regular 
zero, but not more than countably many; 


(ii) ACFExp is categorical in uncountable powers Kk, that is, for every such k 
there is a unique, up to isomorphism, algebraically closed field with pseu- 
doexponentiation of cardinality k; 

(iii) An algebraically closed field with pseudoexponentiation carries a homo- 
geneous pregeometry, in particular, any bijection between two bases of 
the pregeometry can be extended to an automorpism of the field with 
pseudoexponentiation. 


A consequence of the theorem is that Schanuel’s conjecture is consistent 
with the field-theoretic algebra. The categoricity statement (ii) and homogene- 
ity statement (iii) strengthen this further on: Not only is Schanuel’s conjecture 
consistent, but along with other axioms, it also makes the algebra of the struc- 
ture uniquely nice. 

These simple arguments suggest the following. 


6.17 Conjecture. The unique algebraically closed field with pseudoexponenti- 
ation of power the continuum is isomorphic to (C,+,:,exp), the complez field 
with exponentiation. 

Clearly this conjecture implies Schanuel’s conjecture. But there is also the 
part (b) in the axioms of ACFExp, which leads to the formulation of a new 
conjecture: 

(C,+,-,exp) ts algebraically closed as a field with exponentiation. 

The precise meaning of the assumption (b) can be found in the original 
paper. We present here a theorem supporting the conjecture, that is the state- 
ment of the theorem is a formal corollary of the conjecture. 


Theorem (W. Henson and L. Rubin, 1983) Let f(x) be a term in one 
variable in the language (+,-,exp) and constant symbols for complex numbers. 
Assume that f(x) is not of the form e9), where g(x) is another such term. 
Then the equation f(%) =0 has a solution in C. 


6.18 A test for Schanuel’s conjecture. The model-theoretic interpretation 
of Schanuel’s conjecture has the advantage of the utmost generality. We can, 
for example, look for the simplest version of a Schanuel-like conjecture with 
the hope to test its validity. (Note that no natural version of a Schanuel-like 
conjecture has been proven so far.) 

Apparently the easiest form of a Schanuel-like conjecture is for an ana- 
lytic function f(a) on C that satisfies no functional equation. In this case the 
Hrushovski inequality must have the form (8), Section 6.12. Does such a function 
exist? If yes, is the structure (C,+,-, f) algebraically closed in the appropriate 
sense? 
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Both questions have positive answers. A. Wilkie has shown that an entire 
analytic function given as 


n>0 On 


with a, very rapidly increasing integers (e.g., dy, = 22”') satisfies the Hrushovski 
inequality. P. Koiran proved that the structure is algebraically closed. 


7 Other Languages and Nonelementary Model Theory 


The second-order languages such as L2Reals proved unsuitable for a model- 
theoretic analysis, so various other, more tamer, extensions of first-order 
languages were considered. Among the most natural ones are the languages 
L),y, for cardinal numbers A and p, which allow quantification over sequences 
of variables of length < ys and Boolean operations over sets of formulas of 
cardinality < X. 

These languages can be further enhanced by allowing, say, the quantifier Q, 
which in expressions of the form Qx P(x) has the meaning “there exists at least 
Ni-many « such that P(«).” 

The main difficulty in studying these languages is the failure of any form of 
the compactness theorem. 

Some progress in the study of these languages was achieved in the 1960s and 
1970s, but further attempts, in particular in the spirit of classification theory 
of Sections 5 and 6, led to a complete rethinking of the approach to non-first- 
order model theory. Shelah introduced the new concept of abstract elementary 
classes, which is not based on any class of logic formulas. 


7.1 Definition. Given cardinals \ and yw and an alphabet L, L),,,(L) is the 
smallest collection of formulas that contain all atomic Z-formulas in the vari- 
ables vq, a < ps, and closed under taking —, applying universal quantifiers to a 
string of variables Vu;, - - wa --- P, applying existential quantifiers to a string 
of variables du,;, --- du; ---P, and applying disjunction \/, Py or conjunction 
A. Pa to fewer than A ee 

The interpretation of L),,,(Z)-formulas in L-structures is defined along the 
same lines as that for first-order formulas. 

A formula of the language L..,,,(Z) is a formula of the language L),,,(L), for 
some .. 

The language LE. (L) is obtained by allowing the use, along with formulas 
of Loo,n(L), of the quantifier Q, with the interpretation explained above. 

An example of the possible use of these languages is the axiomatization 
in 6.16. The axioms in (i)(a) require L,,,,., and in (b), LE... 

The following is one of the basic results about infinitary languages; compare 
with the Ehrenfeucht—Fraisse criterion. 


7.2 Theorem (C. Karp) Two L-structures A and B are Lo.w.(L)-equivalent if 
and only if there is a back-and-forth system between A and B (definition 3.14). 
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When A and B are countable we have a corollary that L,,, .»(Z)-equivalence 
amounts to an isomorphism between the structures. A stronger result is the 
following categoricity result. 


Theorem (D. Scott). Given a countable L and a countable L-structure A, 
there is an Lw,.(L)-sentence (A) true in A and such that any countable 
model of &(A) is isomorphic to A. 

Note how this theorem emphasizes the special effect of categoricity in small 
cardinals, or cardinals small compared to the cardinality of the whole language, 
the set of all formulas. For a first-order language exactly the same statement 
holds when we replace “countable” by “finite.” In fact, this very effect explains 
why the categoricity in uncountable cardinalities has given an impetus to the 
richest part of modern model theory, the first-order stability theory (Section 5). 


7.3 Léwenheim—Skolem theorems for L),,, and other languages. The 
situation here is much more complex than for the first-order languages. The 
downward Lowenheim—Skolem holds but in a restricted form. Say, for a count- 
able L, an infinite L-structure A, an infinite cardinal & < cardA, and each 
Lu, w-sentence P that holds in A there is an L-substructure B C A such that 
BEP. 

The proof uses the Skolem functions much in the same way as in the first- 
order case, see 3.4. 

But the analogue of the upward theorem is not true. There are Ly, .- 
sentences that have models but not higher than a certain cardinality. For 
example, in the language of arithmetic extended by a unary predicate N and 
a binary predicate € we can state in the form of an L,,-sentence @ that the 
predicate N defines the subset N of the model such that (N,+,-,0,1) is a 
standard arithmetic; 

if ey holds then « € N and y ¢ N; moreover, 

Vu,yEN(YW=y2e Va Ee Naey ove yp). 

Clearly this sentence has models at most of cardinality 2%. 

One can extend this method to obtain sentences with models of cardinalities 
bounded by 2?°°,2?) "0... 


For the general L),,,-language the situation is even more complex. 


7.4 Categoricity for L,,,.. in uncountable cardinals. This problem was 
first attacked by J. Kiesler in the 1970s, in an attempt to extend the Mor- 
ley theory to L,,,.. Kiesler proved that the main results go through provided 
one can establish the fact that models of an L,, .,-sentence categorical in an 
uncountable cardinal are homogeneous, which is of course the case for first- 
order languages. But shortly after Kiesler’s work appeared, counterexamples to 
this assumption were found. More recently, examples of uncountably categorical 
L., ,w-Sentences with nonhomogeneous uncountable models were found in the 
context of mainstream mathematics. 


7.5 Example. Consider the structure on the complex numbers 


C= (C,4,p), where p(2,y,2) Se Hel =e 
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Notice that the subgroup 277Z is definable in C*® as 
{veC:Va,y,z2 e*+e¥ =e? oe® + e¥ = e7*"}, 


Now, if we introduce a definable set C* = C/27iZ and a definable canoni- 
cal homomorphism exp : C — C* we get an equivalent representation of the 
structure as a two-sorted structure (C, C*U{0}) with the additive group struc- 
ture (C,+) on the first sort, the field structure (C* U {0},-,+) on the second 
sort, and exp mapping the first sort into the second sort. We can describe this 
structure by an L,,, ..-sentence 1’ saying that: 


- (C,+) is a divisible torsion-free group; 

- C*U {0} with respect to + and - is an algebraically closed field of charac- 
teristic 0; 

- the kernel of exp is an infinite cyclic group. 


It takes a nontrivial algebra (theory of fields) in combination with model 
theory to prove that X’ has a unique, up to isomorphism, model in every un- 
countable cardinality. But any such model is not homogeneous. 


7.6 Abstract elementary classes. Shelah, who has been in the forefront of 
studies in non-first-order model theory, was the first to realize that the syntactic 
specification of non-first-order languages has little relevance to model theory, 
and the more important are algebraic characteristics of classes of models, which 
eventually depend more on the meaning of specific axioms than the syntax of 
the language. This resulted in the following definition. 

A class of L-structures K equipped with a notion of “strong submodel” ~ is 
said to be an abstract elementary class (AEC) if the class K and class of pairs 
satisfying the binary relation =< are each closed under isomorphism and satisfy 
the following conditions: 


(a) If A<B then ACB. 
(b) = is a partial order on K. 
(c) If {A; : 7 < 6} is a <-increasing chain in K closed under limits, then: 
(i) As=Ujes Ai € Ki; 
(ii) for each j < 6, A; = As; 
(iii) if each A; x B € K then A; = B. 
(d) fA,B,C¢ K,A<BB<C,andACC thn A<C. 
(e) There is a (L6wenheim-Skolem) cardinal number LS(K) such that if A C 
B € K, there is an A’ € K with A C A’ X Band card A’ < card A+ LS(K). 


7.7 Examples. 


(a) Any first-order axiomatizable class of L-structures with respect to <, the 
elementary embedding, is AEC. 

(b) The class of models of the L,,,.-sentence S’ in 7.5 with respect to the 
embedding C is AEC. 

(c) The class Hs emerging in Hrushovski’s construction with respect to the 
strong embedding, see 6.12, is AEC. 
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(d) The class of fields with pseudoexponetiation is AEC with respect to the 
strong embedding corresponding to the Schanuel predimension 6(X) = 
trd X — ldim_ X; see 6.16. 


The theory of abstract elementary classes brought model theory closer to 
the tradition of abstract algebra but enriched with the vast technology of 
classification theory. The most powerful results of the theory are the 
following. 


7.8 Theorem (S. Shelah). There is a Hanf number uw (not computed but 
depending only on the Lowenheim—Skolem number LS(K)) such that if an AEC 
K has arbitrarily large models and satisfies the amalgamation property and the 
joint embedding property for its models, then provided that K is categorical in a 
successor cardinal larger than js, it is categorical in all larger cardinals. 


In a more specific situation we have the following. 


7.9 Theorem (S. Shelah). Assume the mild set-theoretic assumptions 2%" < 
22+1 for all natural n. Let © be an Lu, w-sentence that is categorical in QRn 
for every n. Then X' has a unique model in every infinite cardinal. 

For further reading on the subject of infinitary languages and AEC see 
J. Baldwin’s book [14]. 
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